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A Corporate Dedication to 
- Quality and Reliability 


National Semiconductor is an industry leader in the 
manufacture of high quality, high reliability integrated 
circuits. We have been the leading proponent of driv- 
ing down IC defects and extending product lifetimes. 
From raw material through product design, manufac- 
turing and shipping, our quality and reliability is second 
to none. 

We are proud of our success .. . it sets a standard for 
others to achieve. Yet, our quest for perfection is on- 
going so that you, our customer, can continue to rely 
on National Semiconductor Corporation to produce 
high quality products for your design systems. 
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President, Chief Executive Officer 
National Semiconductor Corporation 


Wir fuhien uns zu Qualitat und 


| Zuverlassigkeit verpflichtet 


Un Impegno Societario di Qualita e 
Affidabilita 


National Semiconductor Corporation ist fuhrend bei der Her- 
stellung von integrierten Schaltungen hoher Qualitat und 
hoher Zuverlassigkeit. National Semiconductor war schon 
immer Vorreiter, wenn es galt, die Zahl von IC Ausfallen zu 
verringern und die Lebensdauern von Produkten zu verbes- 
sern. Vom Rohmaterial Uber Entwurf und Herstellung bis zur 
Auslieferung, die Qualitat und die Zuverlassigkeit der Pro- 
dukte von National Semiconductor sind unibertroffen. 


Wir sind stolz auf unseren Erfolg, der Standards setzt, die 
fur andere erstrebenswert sind. Auch ihre Anspriche steig- 
en standig. Sie als unser Kunde kénnen sich auch weiterhin 
auf National Semiconductor verlassen. 


La Qualite et La Fiabilite: 
Une Vocation Commune Chez National 
Semiconductor Corporation 


National Semiconductor Corporation est un des leaders in- 
dustriels qui fabrique des circuits intégrés d’une trés grande 
qualité et d’une fiabilite exceptionelle. National a été le pre- 
mier a vouloir faire chuter le nombre de circuits intégrés 
défectueux et a augmenter la durée de vie des produits. 
Depuis les matiéres premiéres, en passant par la concep- 
tion du produit sa fabrication et son expédition, partout la 
qualité et la fiabilité chez National sont sans équivalents. 


Nous sommes fiers de notre succés et le standard ainsi 
_ défini devrait devenir l’objectif a atteindre par les autres so- 
ciétés. Et nous continuons a vouloir faire progresser notre 
recherche de la perfection; il en résulte que vous, qui étes 


notre client, pouvez toujours faire confiance a National 


Semiconductor Corporation, en produisant des systémes 
d’une trés grande qualité standard. 


National Semiconductor Corporation é un’industria al ver- 
tice nella costruzione di circuiti integrati di alta qualita ed 
affidabilita. National é stata il principale promotore per |’ab- 
battimento della difettosita dei circuiti integrati e per |’allun- 
gamento della vita dei prodotti. Dal materiale grezzo attra- 
verso tutte le fasi di progettazione, costruzione e spedi- 
zione, la qualita e affidabilita National non 6 seconda a nes- 
suno. 


Noi siamo orgogliosi del nostro successo che fissa per gli 
altri un traguardo da raggiungere. II nostro desiderio di per- 
fezione é d’altra parte illimitato e pertanto tu, nostro cliente, 
puoi continuare ad affidarti a National Semiconductor Cor- 
poration per la produzione dei tuoi sistemi con elevati livelli 
di qualita. 
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LIFE SUPPORT POLICY 


NATIONAL’S PRODUCTS ARE NOT AUTHORIZED FOR USE AS CRITICAL COMPONENTS IN LIFE SUPPORT DEVICES OR 
SYSTEMS WITHOUT THE EXPRESS WRITTEN APPROVAL OF THE PRESIDENT OF NATIONAL SEMICONDUCTOR CORPORA- 
TION. As used herein: 


1. Life support devices or systems are devices or systems 
which, (a) are intended for surgical implant into the body, 
or (b) support or sustain life, and whose failure to per- 
form, when properly used in accordance with instructions 
for use provided in the labeling, can be reasonably ex- 
pected to result in a significant injury to the user. 


2. A critical component is any component of a life support 
device or system whose failure to perform can be reason- 
ably expected to cause the failure of the life support de- 
vice or system, or to affect its safety or effectiveness. 


National Semiconductor Corporation 2900 Semiconductor Drive, P.O. Box 58090, Santa Clara, California 95052-8090 (408) 721-5000 
TWX (910) 339-9240 | 


National does not assume any responsibility for use of any circuitry described, no circuit patent licenses are implied, and National reserves the right, at any time 
without notice, to change said circuitry or specifications. 


Introduction 


Dear Customer, 


Introduction of the NS32CG16 marks a major milestone 
in the continuing evolution of the Series 32000® family 
of high performance 32-bit microprocessors. With the 
NS32CG16, your system can be powered with a 32-bit 
processor optimized for embedded control applica- 
tions. 

The NS32CG16 offers high integration, the perform- 
ance of a fully programmable 32-bit microprocessor 
and graphics support—all on one chip. Our endeavor 
has been to design a microprocessor with the system 
designer’s needs in mind. We hope you will benefit 
from this effort. 

National also offers an array of VLSI solutions for pe- 
ripheral functions, from DRAM controllers to single-chip 
SCSI controllers and Ethernet controllers. With this of- 
fering we hope to meet all of your VLSI needs. 


A ee ee 


Richard L. Sanquini 
Division Vice President 
Micro Systems Group 
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PRELIMINARY 


High-Performance Printer/Display Processor 


General Description 


The NS32CG16 is a 32-bit microprocessor in the Series 
32000® family that provides special features for graphics 
applications. It is specifically designed to support page ori- 
ented printing technologies such as Laser, LCS, LED, lon- 
Deposition and InkJet. 


The NS32CG16 provides a 16 Mbyte linear address space 
and a 16-bit external data bus. It also has a 32-bit ALU, an 
eight-byte prefetch queue, and a slave processor interface. 


The capabilities of the NS32CG16 can be expanded by us- 
ing an external floating point unit which interfaces to the 
NS32CG16 as a slave processor. This combination pro- 
vides optimal support for outline character fonts. 


The NS32CG16’s highly efficient architecture, in addition to 
the built-in capabilities for supporting BITBLT (BIT-aligned 
BLock Transfer) operations and other special graphics func- 
tions, make the device the ideal choice to handle a variety 
of page description languages such as Postscript™™ and 
PCL™, 


Features 
m Software compatible with the Series 32000 family 
m@ 32-bit architecture and implementation 
m™ 16 Mbyte linear address space 
@ Special support for graphics applications 
— 18 graphics instructions 
— Binary compression/expansion capability for font 
storage using RLL encoding 
— Pattern magnification for Epson and HP LaserJet™ 
emulations 
— 6 BITBLT instructions on chip 
— Interface to an external BITBLT processing unit for 
very fast BITBLT operations (optional) 
m Floating point support via the NS32081 or the NS32381 
for outline fonts, scaling and rotation 
m@ On-chip clock generator 
m Optimal interface to large memory arrays via the 
DP84xx family of DRAM controllers 
m@ Power save mode 
m@ High-speed CMOS technology 
m 68-pin plastic PCC package 


1.0 Product Introduction 


The NS32CG16 is a high speed CMOS microprocessor in 
the Series 32000 family. It is software compatible with all 
the other CPUs in the family. The device incorporates all of 
the Series 32000 advanced architectural features, with the 
exception of the virtual memory capability. 


Brief descriptions of the NS32CG16 features that are 
shared with other members of the family are provided be- 
low: 


Powerful Addressing Modes. Nine addressing modes 
available to all instructions are included to access data 
structures efficiently. 


Data Types. The architecture provides for numerous data 
types, such as byte, word, doubleword, and BCD, which may 
be arranged into a wide variety of data structures. 


Symmetric Instruction Set. While avoiding special case 
instructions that compilers can’t use, the Series 32000 fami- 
ly incorporates powerful instructions for control operations, 
such as array indexing and external procedure calls, which 
save considerable space and time for compiled code. 


Memory-to-Memory Operations. The Series 32000 CPUs 


represent two-address machines. This means that each op- | 


erand can be referenced by any one of the addressing 
modes provided. 


This powerful memory-to-memory architecture permits 
memory locations to be treated as registers for all useful 
operations. This is important for temporary operands as well 
as for context switching. 


Large, Uniform Addressing. The NS32CG16 has 24-bit 
address pointers that can address up to 16 megabytes with- 
out any segmentation; this addressing scheme provides 
flexible memory management without added-on expense. 


Modular Software Support. Any software package for the 
Series 32000 family can be developed independent of all 
other packages, without regard to individual addressing. In 
addition, ROM code is totally relocatable and easy to ac- 
cess, which allows a significant reduction in hardware and 
software cost. 


Software Processor Concept. The Series 32000 architec- 
ture allows future expansions of the instruction set that can 
be executed by special slave processors, acting as exten- 
sions to the CPU. This concept of slave processors is 
unique to the Series 32000 family. It allows software com- 
patibility even for future components because the slave 
hardware is transparent to the software. With future ad- 
vances in semiconductor technology, the slaves can be 
physically integrated on the CPU chip itself. 


To summarize, the architectural features cited above pro- 
vide three primary performance advantages and character- 
istics: 

¢ High-Level Language Support 

e Easy Future Growth Path 

e Application Flexibility 
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1.0 Product Information (Continued) 
1.1 NS32CG16 SPECIAL FEATURES 


In addition to the above Series 32000 features, the 
NS32CG16 provides features that make the device ex- 
tremely attractive for a wide range of applications where 
graphics support, low chip count, and low power consump- 
tion are required. 


The most relevant of these features are the graphics sup- 
port capabilities, that can be used in applications such as 
printers, CRT terminals, and other varieties of display sys- 
tems, where text and graphics are to be handled. 


Graphics support is provided by eighteen instructions that 
allow operations such as BITBLT, data compression/expan- 
sion, fills, and line drawing, to be performed very efficiently. 
In addition, the device can be easily interfaced to an exter- 
nal BITBLT Processing Unit (BPU) for high BITBLT perform- 
ance. 


The NS32CG16 allows systems to be built with a relatively 
small amount of random logic. The bus is highly optimized 
to allow simple interfacing to a large variety of DRAMs and 
peripheral devices. All the relevant bus access signals and 
clock signals are generated on-chip. The cycle extension 
logic is also incorporated on-chip. 


The device is fabricated in a low-power, double-poly, single 
metal, CMOS technology. It also includes a power-save fea- 
ture that allows the clock to be slowed down under software 
control, thus minimizing the power consumption. This fea- 
ture can be used in those applications where power saving 


during periods of low performance demand is highly desir- 


able. 


The bus characteristics and the power save feature are de- 
scribed in the “Functional Description” section. A general 
overview of BITBLT operations and a description of the 
graphics support instructions is provided in Section 2.4. De- 
tails on all the NS32CG16 instructions can be found in the 
NS32CG16 Printer/Display Processor Programmer’s Refer- 
ence Supplement and the related NS32CG16 supplement. 


Below is a summary of the instructions that are directly ap- 
plicable to graphics along with their intended use. 


instruction Application 

BBAND The BitBIt group of instructions provide a 

BBOR method of quickly imaging characters, creating 

BBFOR patterns, windowing and other block oriented _ 

BBXOR effects. 

BBSTOD 

BITWT 

EXTBLT 

MOVMP Move Multiple Pattern is a very fast instruction 
for clearing memory and drawing patterns and 
lines. 

TBITS Test Bit String will measure the length of 1’s or 


0’s in an image, supporting many data 
compression methods (RLL), TBITS may also 
be used to test for boundaries of images. 


Application 


Instruction 


SBITS Set Bit String is a very fast instruction for filling 
objects, outline characters and drawing 
horizontal lines. 

The TBITS and SBITS instructions support 
Group 3 and Group 4 CCITT communications 
(FAX). 

Set Bit Perpendicular String is a very fast 
instruction for drawing vertical, horizontal and 
45° lines. 

In printing applications SBITS and SBITPS may 
be used to express portrait and landscape 
respectively from the same compressed font 
data. The size of the character may be scaled as 
it is drawn. 


SBIT The Bit group of instructions enable single pixels 
CBIT anywhere in memory to be set, cleared, tested 
TBIT or inverted. | 

IBIT 
INDEX 


SBITPS 


The INDEX instruction combines a multiply-add 
sequence into a single instruction. This provides 
a fast translation of an X-Y address to a pixel 
relative address. 


2.0 Architectural Description 


2.1 REGISTER SET 


The NS32CG16 CPU has 17 internal registers grouped ac- 
cording to functions as follows: 8 general purpose, 7 ad- 
dress, 1 processor status and 1 configuration. Figure 2-7 
shows the NS32CG16 internal registers. 


Address 
< 32Bits — 


INTBASE 


General Purpose. 
<— 32Bits — 


Processor Status Configuration 


[cra 


FIGURE 2-1. NS32CG16 Internal Registers 


2.1.1 General Purpose Registers 


There are eight registers (RO—R7) used for satisfying the 
high speed general storage requirements, such as holding 
temporary variables and addresses. The general purpose 
registers are free for any use by the programmer. They are 
32 bits in length. If a general purpose register is specified for 


2.0 Architectural Description (Continued) 


an operand that is 8 or 16 bits long, only the low part of the 
register is used; the high part is not referenced or modified. 


2.1.2 Address Registers 


The seven address registers are used by the processor to 
implement specific address functions. Except for the MOD 
register that is 16 bits wide, all the others are 32 bits. In the 
NS32CG16 only the lower 24 bits are implemented in the six 
32-bit address registers. The top 8 bits are always zero. A 
description of the address registers follows. 


PC—Program Counter. The PC register is a pointer to the 
first byte of the instruction currently being executed. The PC 
is used to reference memory in the program section. 


SP0, SP1—Stack Pointers. The SPO register points to the 
lowest address of the last item stored on the INTERRUPT 
STACK. This stack is normally used only by the operating 
system. It is used primarily for storing temporary data, and 
holding return information for operating system subroutines 
and interrupt and trap service routines. The SP1 register 
points to the lowest address of the last item stored on the 
USER STACK. This stack is used by normal user programs 
to hold temporary data and subroutine return information. 


When a reference is made to the selected Stack Pointer 
(see PSR S-bit), the terms ‘SP Register’ or ‘SP’ are used. 
SP refers to either SPO or SP1, depending on the setting of 
the S bit in the PSR register. If the S bit in the PSR is 0, SP 
refers to SPO. If the S bit in the PSR is 1 then SP refers to 
SP1. 


Stacks in the Series 32000 family grow downward in memo- 
ry. A Push operation pre-decrements the Stack Pointer by 
the operand length. A Pop operation post-increments the 
Stack Pointer by the operand length. 


FP—Frame Pointer. The FP register is used by a procedure 
to access parameters and local variables on the stack. The 
FP register is set up on procedure entry with the ENTER 
instruction and restored on procedure termination with the 
EXIT instruction. 


The frame pointer holds the address in memory occupied by 
the old contents of the frame pointer. 


SB—Static Base. The SB register points to the global vari- 


ables of a software module. This register is used to support ' 


relocatable global variables for software modules. The SB 
register holds the lowest address in memory occupied by 
the global variables of a module. 

INTBASE—Interrupt Base. The INTBASE register holds 
the address of the dispatch table for interrupts and traps 
(Section 3.2.1). 

MOD-—Module. The MOD register holds the address of the 
module descriptor of the currently executing software mod- 
ule. The MOD register is 16 bits long, therefore the module 
table must be contained within the first 64 kbytes of memo- 
ry. 

2.1.3 Processor Status Register 

The Processor Status Register (PSR) holds status informa- 
tion for the microprocessor. 


The PSR is sixteen bits long, divided into two eight-bit 
halves. The low order eight bits are accessible to all pro- 
grams, but the high order eight bits are accessible only to 
programs executing in Supervisor Mode. 
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FIGURE 2-2. Processor Status Register (PSR) 


C The C bit indicates that a carry or borrow occurred after 
an addition or subtraction instruction. It can be used with 
the ADDC and SUBC instructions to perform multiple- 
precision integer arithmetic calculations. It may have a 
setting of 0 (no carry or borrow) or 1 (carry or borrow). 


T The T bit causes program tracing. If this bit is set to 1, a 
TRC trap is executed after every instruction (Section 
3.3.1). 


L. The L bit is altered by comparison instructions. In a com- 
parison instruction the L bit is set to 1” if the second 
operand is less than the first operand, when both oper- 
ands are interpreted as unsigned integers. Otherwise, it 
is set to “0”. In Floating-Point comparisons, this bit is 
always cleared. / 


K Reserved for use by the CPU. 
Reserved for use by the CPU. 


F The F bit is a general condition flag, which is altered by 
many instructions (e.g., integer arithmetic instructions 
use it to indicate overflow). 


Z The Z bitis altered by comparison instructions. In a com- 
parison instruction the Z bit is set to “1” if the second 
operand is equal to the first operand; otherwise it is set 
to “0”. 

N The N bit is altered by comparison instructions. In a 
comparison instruction the N bit is set to “1” if the sec- 
ond operand is less than the first operand, when both 
operands are interpreted as signed integers. Otherwise, 
it is set to “O”’. 

U If the U bit is “1” no privileged instructions may be exe- 
cuted. If the U bit is “0” then all instructions may be 
executed. When U=0 the processor is said to be in Su- 
pervisor Mode; when U= 1 the processor is said to be in 
User Mode. A User Mode program is restricted from exe- 
cuting certain instructions and accessing certain regis- 
ters which could interfere with the operating system. For 
example, a User Mode program is prevented from 
changing the setting of the flag used to indicate its own 
privilege mode. A Supervisor Mode program is assumed 
to be a trusted part of the operating system, hence it has 
no such restrictions. | 


S The S bit specifies whether the SPO register or SP1 reg- 
ister is used as the Stack Pointer. The bit is automatical- 
ly cleared on interrupis and traps. It may have a setting 
of O (use the SPO register) or 1 (use the SP1 register). 


P The P bit prevents a TRC trap from occurring more than 
once for an instruction (Section 3.3.1). It may have a 
setting of O (no trace pending) or 1 (trace pending). 

i If !=1, then all interrupts will be accepted. If |1=0, only 
the NMI interrupt is accepted. Trap enables are not af- 
fected by this bit. 


Cc. 
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2.0 Architectural Description (Continue) 


B Reserved for use by the CPU. This bit is set to 1 during 
the execution of the EXTBLT instruction and causes the 
BPU signal to become active. Upon reset, B is set to 
zero and the BPU signal is set high. 

Note 1: When an interrupt is ¢ acknowledged, the B, |, P, S and U bits are set 

to zero and the BPU signal is set high. A return from interrupt will 


restore the original values from the copy of the PSR register saved 
in the interrupt stack. 


Note 2: If BITBLT (BB) instructions are executed in an interrupt routine, the 
PSR bits J and K must be cleared first. 


2.1.4 Configuration Register 


The Configuration Register (CFG) is 8 bits wide, of which 
four bits are implemented. The implemented bits are used to 
declare the presence of certain external devices and to se- 
lect the clock scaling factor. CFG is programmed by the 
SETCFG instruction. The format of CFG is shown in Figure 
2-3. The various control bits are described below. 


FIGURE 2-3. Configuration Register (CFG) 


| Interrupt vectoring. This bit controls whether maskable 
interrupts are handled in nonvectored (l=0) or vectored 
(1= 1) mode. Refer to Section 3.2.3 for more information. 


F Floating-point instruction set. This bit indicates whether 
a floating-point unit (FPU) is present to execute floating- 
point instructions. If this bit is O when the CPU executes 
a floating-point instruction, a Trap (UND) occurs. If this 
bit is 1, then the CPU transfers the instruction and any 
necessary operands to the FPU using the slave-proces- 
sor protocol described in Section 3.1.4.1. 


M Clock scaling. This bit is used in conjuction with the C bit 
to select the clock scaling factor. 


C Clock scaling. Same as the M bit above. Refer to Sec- 
tion 3.2.1 on “Power Save Mode” for details. 


2.2 MEMORY ORGANIZATION 


The main memory of the NS32CG16 is a uniform linear ad- 
dress space. Memory locations are numbered sequentially 
starting at zero and ending at 224—1. The number specify- 
ing a memory location is called an address. The contents of 


each memory location is a byte consisting of eight bits. Un- 


less otherwise noted, diagrams in this document show data 
stored in memory with the lowest address on the right and 
the highest address on the left. Also, when data is shown 
vertically, the lowest address is at the top of a diagram and 
the highest address at the bottom of the diagram. When bits 
are numbered in a diagram, the least significant bit is given 
the number zero, and is shown at the right of the diagram. 
Bits are numbered in increasing significance and toward the 
left. 


7 0 


Byte at Address A 


Two contiguous bytes are called a word. Except where not- 
ed, the least significant byte of a word is stored at the lower 
address, and the most significant byte of the word is stored 
at the next higher address. In memory, the address of a 
word is the address of its least significant byte, and a word 
may start at any address. 
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Two contiguous words are called a double-word. Except 
where noted, the least significant word of a double-word is 
stored at the lowest address and the most significant word 
of the double-word is stored at the address two higher. In 
memory, the address of a double-word is the address of its 
least significant byte, and a double-word may start at any 
address. 


MSB 


LSB 
Double Word at Address A 


Although memory is addressed as bytes, it is actually orga- 
nized as words. Therefore, words and double-words that are 
aligned to start at even addresses (multiples of two) are 
accessed more quickly than words and double-words that 
are not so aligned. 


2.2.1 Dedicated Tables 


Two of the NS32CG16 dedicated registers (MOD and INT- 
BASE) serve as pointers to dedicated tables in memory. 


The INTBASE register points to the Interrupt Dispatch and 
Cascade tables. These are described in Section 3.8. 


The MOD register contains a pointer into the Module Table, 
whose entries are called Module Descriptors. A Module De- 
scriptor contains four pointers, three of which are used by 
the NS32CG16. The MOD register contains the address of 
the Module Descriptor for the currently running module. It is 
automatically updated by the Call External Procedure in- 
structions (CXP and CXPD). 


The format of a Module Descriptor is shown in Figure 2-4. 
The Static Base entry contains the address of static data 
assigned to the running module. It is loaded into the CPU 
Static Base register by the CXP and CXPD instructions. The 
Program Base entry contains the address of the first byte of 
instruction code in the module. Since a module may have 
multiple entry points, the Program Base pointer serves 3 only 
as a reference to find them. 
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STATIC BASE 
LINK TABLE ADDRESS 
PROGRAM BASE 


TL/EE/9424-2 
FIGURE 2-4. Module Descriptor Format 


2.0 Architectural Description (Continue) 


The Link Table Address points to the Link Table for the 
currently running module. The Link Table provides the infor- 
mation needed for: 


1) Sharing variables between modules. Such variables 
are accessed through the Link Table via the External 
addressing mode. 


2) Transferring control from one module to another. This 
is done via the Call External Procedure (CXP) instruc- 
tion. 


The format of a Link Table is given in Figure 2-5. A Link 
Table Entry for an external variable contains the 32-bit ad- 
dress of that variable. An entry for an external procedure 
contains two 16-bit fields: Module and Offset. The Module 
field contains the new MOD register contents for the mod- 
ule being entered. The Offset field is an unsigned number 
giving the position of the entry point relative to the new 
module’s Program Base pointer. 


For further details of the functions of these tables, see the 
Series 32000 Instruction Set Reference Manual. 


ENTRY 131 


ABSOLUTE ADDRESS 
ABSOLUTE ADDRESS 
OFFSET MODULE 
TL/EE/9424-3 


FIGURE 2-5. A Sample Link Table 
2.3 INSTRUCTION SET 


2.3.1 General instruction Format 


Figure 2-6 shows the general format of a Series 32000 in- 
struction. The Basic Instruction is one to three bytes long 
and contains the Opcode and up to 5-bit General Address- 
ing Mode (“‘Gen’’) fields. Following the Basic Instruction 
field is a set of optional extensions, which may appear de- 
pending on the instruction and the addressing modes se- 
lected. 


Index Bytes appear when either or both Gen fields specify 
Scaled Index. In this case, the Gen field specifies only the 
Scale Factor (1, 2, 4 or 8), and the Index Byte specifies 
which General Purpose Register to use as the index, and 
which addressing mode calculation to perform before index- 
ing. See Figure 2-7. 


(VARIABLE) 


(VARIABLE) 


(PROCEDURE) 


OPTIONAL 
EXTENSIONS 


IMPLIED 
IMMEDIATE 
OPERAND(S) 


Following Index Bytes come any displacements (addressing 
constants) or immediate values associated with the select- 
ed addressing modes. Each Disp/Imm field may contain 
one of two displacements, or one immediate value. The size 
of a Displacement field is encoded within the top bits of that 
field, as shown in Figure 2-8, with the remaining bits inter- 
preted as a signed (two’s complement) value. The size of an 
immediate value is determined from the Opcode field. Both 
Displacement and Immediate fields are stored most-signifi- 
cant byte first. Note that this is different from the memory 
representation of data (Section 2.2). 


Some instructions require additional ‘implied’ immediates 
and/or displacements, apart from those associated with ad- 
dressing modes. Any such extensions appear at the end of 
the instruction, in the order that they appear within the list of 
operands in the instruction definition (Section 2.3.3). 


TL/EE/9424-5 
FIGURE 2-7. Index Byte Format 


2.3.2 Addressing Modes 


The NS32CG16 CPU generally accesses an operand by cal- 
culating its Effective Address based on information avail- 
able when the operand is to be accessed. The method to be 
used in performing this calculation is specified by the pro- 
grammer as an “addressing mode.” 


Addressing modes in the NS32CG16 are designed to opti- 
mally support high-level language accesses to variables. In 
nearly all cases, a variable access requires only one ad- 
dressing mode, within the instruction that acts upon that 
variable. Extraneous data movement is therefore minimized. 


NS32CG16 Addressing Modes fall into nine basic types: 


Register: The operand is available in one of the eight Gen- 
eral Purpose Registers. In certain Slave Processor instruc- 
tions, an auxiliary set of eight registers may be referenced 
instead. 


Register Relative: A General Purpose Register contains an 
address to which is added a displacement value from the 
instruction, yielding the Effective Address of the operand in 
memory. 


BASIC 
INSTRUCTION 


OPCODE 


TL/EE/9424-4 


FIGURE 2-6. General Instruction Format 
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2.0 Architectural Description (continued) 


Memory Space: Identical to Register Relative above, ex- 
cept that the register used is one of the dedicated registers 
PC, SP, SB or FP. These registers point to data areas gen- 
erally needed by high-level languages. 


Memory Relative: A pointer variable is found within the 
memory space pointed to by the SP, SB or FP register. A 
displacement is added to that pointer to generate the Effec- 
tive Address of the operand. 


Byte Displacement: Range — 64 to +63 


0 
po! SIGNED DISPLACEMENT 


Word Displacement: Range — 8192 to + 8191 


Double Word Displacement: 
Range (Entire Addressing Space) 


TL/EE/9424-6 
FIGURE 2-8. Displacement Encodings 
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immediate: The operand is encoded within the instruction. 
This addressing mode is not allowed if the operand is to be 
written. 


Absolute: The address of the operand is specified by a 
displacement field in the instruction. 


External: A pointer value is read from a specified entry of 
the current Link Table. To this pointer value is added a dis- 
placement, yielding the Effective Address of the operand. 


Top of Stack: The currently-selected Stack Pointer (SPO or 
SP1) specifies the location of the operand. The operand is 
pushed or popped, depending on whether it is written or 
read. 


Scaled Index: Although encoded as an addressing mode, 
Scaled Indexing is an option on any addressing mode ex- 
cept Immediate or another Scaled Index. It has the effect of 
calculating an Effective Address, then multiplying any Gen- 
eral Purpose Register by 1, 2, 4 or 8 and adding into the 
total, yielding the final Effective Address of the operand. 


Table 2-1 is a brief summary of the addressing modes. For a 
complete description of their actions, see the Series 32000 
Instruction Set Reference Manual. 


In addition to the general modes, Register-Indirect with 
auto-increment/decrement and warps or pitch are available 
on several of the graphics instructions. 


2.0 Architectural Description (Continued) 
TABLE 2-1. NS32CG16 Addressing Modes 


ENCODING MODE ASSEMBLER SYNTAX EFFECTIVE ADDRESS 
Register 

00000 Register 0 RO or FO None: Operand is in the specified 
00001 Register 1 R1 or F1 register. 

00010 Register 2 R2 or F2 

00011 Register 3 R3 or F3 

00100 Register 4 R4 or F4 

00101 Register 5 R5 or F5 

00110 Register 6 R6 or F6 

00111 Register 7 R6 or F7 

Register Relative 

01000 Register 0 relative disp(RO) Disp + Register. 

01001 Register 1 relative disp(R1) 

01010 Register 2 relative disp(R2) 

01011 Register 3 relative disp(R3) 

01100 Register 4 relative disp(R4) 

01101 Register 5 relative disp(R5) 

01110 Register 6 relative disp(R6) 

01111 Register 7 relative disp(R7) 


Memory Relative 


10000 
10001 
10010 


Reserved 
10011 
Immediate 
10100 


Absolute 
10101 
External 
10110 


Top Of Stack 
10111 


Memory Space 
11000 

11001 

11010 

11011 

Scaled Index 
11100 

11101 

11110 

11111 


Frame memory relative 
Stack memory relative 
Static memory relative 


(Reserved for Future Use) 


immediate 


Absolute 


External 


Top of stack 


Frame memory 
Stack memory 
Static memory 
Program memory 


Index, bytes 

Index, words 

Index, double words 
Index, quad words 


disp2(disp1 (FP)) 
disp2(disp1 (SP)) 
disp2(disp1 (SB)) 


value 


@disp 


EXT (disp1) + disp2 


TOS 


disp(FP) 
disp(SP) 
disp(SB) 
*+ disp 


mode[Rn:B] 
mode[Rn:W] 
mode[Rn:D] 
mode[Rn:Q] 
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Disp2 + Pointer; Pointer found at 
address Disp 1 + Register. “SP” 
is either SPO or SP1, as selected 
in PSR. 


None: Operand is input from 
instruction queue. 


Disp. 


Disp2 + Pointer; Pointer is found 


at Link Table Entry number Disp1. 


Top of current stack, using either 
User or Interrupt Stack Pointer, 
as selected in PSR. Automatic 
Push/Pop included. 


Disp + Register; “SP” is either 
SPO or SP1, as selected in PSR. 


EA (mode) + Rn. 

EA (mode) + 2XRn. 

EA (mode) + 4XRn. 

EA (mode) + 8XRn. 

“Mode” and “‘n” are contained 
within the Index Byte. 

EA (mode) denotes the effective 
address generated using mode. 
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2.0 Architectural Description (continued) 


2.3.3 Instruction Set Summary 


Table 2-2 presents a brief description of the NS32CG16 
instruction set. The Format column refers to the Instruction 
Format tables (Appendix A). The Instruction column gives 
the instruction as coded in assembly language, and the De- 
scription column provides a short description of the function 
provided by that instruction. Further details of the exact op- 


erations performed by each instruction may be found in the 


Series 32000 Instruction Set Reference Manual and the 


NS32CG16 Printer/Display Processor Programmer’s Refer- 
ence. . 
Notations: 
i= Integer length suffix: B = Byte 
W= Word 
D = Double Word 
f= Floating Point length suffix: F = Standard Floating 
L=Long Floating 


gen = General operand. Any addressing mode can be speci- 
fied. 


short=A 4-bit value encoded within the Basic Instruction 
(see Appendix A for encodings). 
imm = impiied immediate operand. An 8-bit value appended 
after any addressing extensions. 


disp = Displacement (addressing constant): 8, 16 or 32 bits. 
All three lengths legal. 


reg= Any General Purpose Register: RO-R7. 
areg=Any Processor Register: SP, SB, FP, 
MOD, PSR, US (bottom 8 PSR bits). 
cond=Any condition code, encoded as a 4-bit field within 
the Basic Instruction (see Appendix A for encodings). 


INTBASE, 


TABLE 2-2. NS32CG16 Instruction Set Summary 


MOVES 
Format Operation Operands Description 
4 | MOVi gen,gen Move a value. 
2 MOVQi short,gen Extend and move a signed 4-bit constant. 
7 MOVMi gen,gen,disp Move multiple: disp bytes (1 to 16). 
7 MOVZBW gen,gen Move with zero extension. 
7 MOVZiD gen,gen Move with zero extension. 
7 MOVXBW gen,gen Move with sign extension. 
7 MOVXiD gen,gen Move with sign extension. 
4 ADDR gen,gen Move effective address. 
INTEGER ARITHMETIC 
Format Operation Operands Description 
4 ADDi gen,gen Add. 
2 ADDQi short,gen Add signed 4-bit constant. 
4 ADDGi gen,gen Add with carry. 
4 SUBi gen,gen Subtract. 
4 SUBCi gen,gen Subtract with carry (borrow). 
6 NEGi gen,gen Negate (2’s complement). 
6 ABSi gen,gen Take absolute value. 
7 MULi gen,gen Multiply. 
7 QUOi gen,gen Divide, rounding toward zero. 
7 REMi gen,gen Remainder from QUO. 
7 DIVi gen,gen Divide, rounding down. 
7 MODi gen,gen Remainder from DIV (Modulus). 
7 MEli gen,gen Multiply to extended integer. 
7 DEli gen,gen Divide extended integer. 
PACKED DECIMAL (BCD) ARITHMETIC 
Format Operation Operands Description 
6 ADDPi gen,gen Add packed. | 
6 SUBPi gen,gen Subtract packed. 
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2.0 Architectural Description (Continued) 


INTEGER COMPARISON 
Format Operation 
4 CMPi 
2 CMPQi 
7 CMPMi 
LOGICAL AND BOOLEAN 
Format Operation 
4 ANDi 
4 ORi 
4 BICi 
4 XORi 
6 COMi 
6 NOTi 
2 Scondi 
SHIFTS 
Format Operation 
6 LSHi 
6 ASHi 
6 ROTI 
BIT FIELDS 


Operands 
gen,gen 
short,gen 
gen,gen,disp 


Operands 


gen,gen 
gen,gen 
gen,gen 
gen,gen 
gen,gen 
gen,gen 
gen 


Operands 


gen,gen 
gen,gen 
gen,gen 


TABLE 2-2. NS32CG16 Instruction Set Summary (Continued) 


Description 

Compare. 

Compare to signed 4-bit constant. 
Compare multiple: disp bytes (1 to 16). 


Description 


Logical AND. 

Logical OR. 

Clear selected bits. 

Logical exclusive OR. 

Complement all bits. 

Boolean complement: LSB only. 

Save condition code (cond) as a Boolean variable of size i. 


Description 


Logical shift, left or right. 
Arithmetic shift, left or right. 
Rotate, left or right. 


Bit fields are values in memory that are not aligned to byte boundaries. Examples are PACKED arrays and records used in 
Pascal. “Extract” instructions read and align a bit field. “Insert” instructions write a bit field from an aligned source. 


Format 


8 


ON N © 


ARRAYS 


Format 


8 
8 


Operation 


EXTi 
INSi 
EXTSi 
INSSi 
CVTP 


_ Operation 
CHECKi 
INDEXi 


Operands 


reg,gen,gen,disp 
reg,gen,gen,disp 
gen,gen,imm,imm 
gen,gen,imm,imm 
reg,gen,gen 


Operands 
reg,gen,gen 
reg,gen,gen 


Description 


Extract bit field (array oriented). 
Insert bit field (array oriented). 
Extract bit field (short form). 
Insert bit field (short form). 
Convert to bit field pointer. 


Description 


Index bounds check. 
Recursive indexing step for multiple-dimensional arrays. 
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2.0 Architectural Description (Continued) 


TABLE 2-2. NS32CG16 Instruction Set Summary (Continued) 


STRINGS 


String instructions assign specific functions to the General 


Purpose Regisiers: 

R4 — Comparison Value 

R3 — Translation Table Pointer 
R2 — String 2 Pointer 

R1 — String 1 Pointer 


RO — Limit Count 


Format Operation 


5 MOVSi 
MOVST 

5 CMPSi 
CMPST 

5 SKPSi 
SKPST 


JUMPS AND LINKAGE 


Format Operation 


JUMP 
BR 
Bcond 
CASEi 
ACBi 
JSR 
BSR 
CXP 
CXPD 
SVC 
FLAG 
BPT 
ENTER 
EXIT 
RET 
RXP 
RETT 
RETI 


CPU REGISTER MANIPULATION 


we he ek eh ek hk OGD et ett COG OND GD OO OO OD 


Format Operation 


SAVE 
RESTORE 
LPRi 

SPRi 
ADJSPi 
BISPSRi 
BICPSRi 
SETCFG 


aoandaadnnnhd ~~ — 


Operands 


options 
options 
options 
options 
options 
options 


Operands 
gen 

disp 

disp 

gen 
short,gen,disp 
gen 

disp 

disp 

gen 


[reg list], disp 
[reg list] 

disp 

disp 

disp 


Operands 


[reg list] 
[reg list] 
areg,gen 
areg,gen 
gen 

gen 

gen 
[option list] 


Options on all string instructions are: 


B (Backward): Decrement strong pointers after each 
step rather than incrementing. 


End instruction if String 1 entry matches 
R4. 


End instruction if String 1 entry does not 
match R4. 


All string instructions end when RO decrements to zero. 


U (Until match): 


W (While match): 


Description 


Move string 1 to string 2. 

Move string, translating bytes. 
Compare string 1 to string 2. 
Compare, translating string 1 bytes. 
Skip over string 1 entries. 

Skip, translating bytes for until/while. 


Description 


Jump. 

Branch (PC Relative). 

Conditional branch. 

Multiway branch. 

Add 4-bit constant and branch if non-zero. 

Jump to subroutine. 

Branch to subroutine. 

Call external procedure 

Cail external procedure using descriptor. 

Supervisor call. 

Flag trap. 

Breakpoint trap. 

Save registers and allocate stack frame (Enter Procedure). 
Restore registers and reclaim stack frame (Exit Procedure). 
Return from subroutine. 

Return from external procedure call. 

Return from trap. (Privileged) 

Return from interrupt. (Privileged) 


Description 


Save general purpose registers. 

Restore general purpose registers. 

Load dedicated register. (Privileged if PSR or INTBASE) 
Store dedicated register. (Privileged if PSR or INTBASE) 
Adjust stack pointer. 

Set selected bits in PSR. (Privileged if not Byte length) 
Clear selected bits in PSR. (Privileged if not Byte length) 
Set configuration register. (Privileged) 
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2.0 Architectural Description (Continued) 


TABLE 2-2. NS32CG16 Instruction Set Summary (Continued) 
FLOATING POINT 


Format Operation Operands Description 
11 MOVf gen,gen Move a floating point value. 
9 MOVLF gen,gen Move and shorten a long value to standard. 
9 MOVFL gen,gen Move and lengthen a standard value to long. 
9 MOVif gen,gen Convert any integer to standard or long floating. 
9 ROUNDfi gen,gen Convert to integer by rounding. 
9 TRUNCfi gen,gen Convert to integer by truncating, toward zero. 
fs) FLOORfi gen,gen Convert to largest integer less than or equal to value. 
11 ADDf gen,gen Add. 
11 SUBf gen,gen Subtract. 
11 MULf gen,gen Multiply. 
11 DIVf gen,gen Divide. 
11 CMPf gen,gen Compare. 
11 NEGf gen,gen Negate. 
11 ABSf gen,gen Take absolute value. 
9 LFSR gen Load FSR. 
9 SFSR gen Store FSR. 
12 POLYf gen,gen Polynomial Step. 
12 DOTT gen,gen Dot Product. 
12 SCALBf gen,gen Binary Scale. 
12 LOGBf gen,gen Binary Log. 
MISCELLANEOUS 
Format Operation Operands Description 
1 NOP No operation. 
1 WAIT Wait for interrupt. 
1 DIA Diagnose. Single-byte “Branch to Self” for hardware 
breakpointing. Not for use in programming. 
GRAPHICS 
Format Operation Operands Description 
5 BBOR options* Bit-aligned block transfer ‘OR’. 
5 BBAND options Bit-aligned block transfer ‘AND’. 
5 BBFOR Bit-aligned block transfer fast ‘OR’. 
5 BBXOR options Bit-aligned block transfer “XOR’. 
5 BBSTOD options Bit-aligned block source to destination. 
5 BITWT Bit-aligned word transfer. 
5 EXTBLT options External bit-aligned block transfer. 
5 MOVMPi Move multiple pattern. 
5 TBITS options Test bit string. 
5 SBITS Set bit string. 
5 SBITPS Set bit perpendicular string. 
BITS 
Format Operation Operands Description 
4 TBITi gen,gen Test bit. 
6 SBITi gen,gen Test and set bit. 
6 SBITIi gen,gen Test and set bit, interlocked. 
6 CBITi gen,gen Test and clear bit. 
6 CBITIi gen,gen Test and clear bit, interlocked. 
6 IBITi gen,gen Test and invert bit. 
8 FFSi gen,gen Find first set bit. 


*Note: Options are controlled by fields of the instruction, PSR status bits, or dedicated register values. 
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2.0 Architectural Description (Continued) 
2.4 GRAPHICS SUPPORT 


The following sections provide a brief description of the 
NS32CG16 graphics support capabilities. Basic discussions 
on frame buffer addressing and BITBLT operations are also 
provided. More detailed information on the NS32CG16 
graphics support instructions can be found in the 
NS32CG16 Printer/Display Processor Programmer’s Refer- 
ence. 


2.4.1 Frame Buffer Addressing 


There are two basic addressing schemes for referencing 
pixels within the frame buffer: Linear and Cartesian (or x-y). 
Linear addressing associates a single number to each pixel 
representing the physical address of the corresponding bit 
in memory. Cartesian addressing associates two numbers 


| to each pixel representing the x and y coordinates of the 


pixel relative to a point in the Cartesian space taken as the 
origin. The Cartesian space is generally defined as having 
the origin in the upper left. A movement to the right increas- 
es the x coordinate; a movement downward increases the y 
coordinate. 


The correspondence between the location of a pixel in the 
Cartesian space and the physical (BIT) address in memory 
is shown in Figure 2-9. The origin of the Cartesian space 
(x=0, y=0) corresponds to the bit address ‘ORG’. Incre- 
menting the x coordinate increments the bit address by one. 
Incrementing the y coordinate increments the bit address by 
an amount representing the warp (or pitch) of the Cartesian 
space. Thus, the linear address of a pixel at location (x, y) in 
the Cartesian space can be found by the following expres- 
sion. 


ADDR = ORG + y * WARP + x 


Warp is the distance (in bits) in the physical memory space 
between two vertically adjacent bits in the Cartesian space. 


Example 1 below shows two NS32CG16 instruction se- 
quences to set a single pixel given the x and y coordinates. 
Example 2 shows how to create a fat pixel by setting four 
adjacent bits in the Cartesian space. 


Example 1: Set pixel at location (x, y) 
Setup: RO x coordinate 
R1 y coordinate 


Instruction Sequence 1: 


MULD WARP, Rl ; Y*WARP 
ADDD- RO, Rl ; + X = BIT OFFSET 
SBITD Rl, ORG ; SET PIXEL 


Instruction Sequence 2: 


INDEXD Rl, (WARP<1), RO 3; Y*WARP + X 
SBITD Rl, ORG ; SET PIXEL 
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Example 2: Create fat pixel by setting bits at locations 
(x, y), (x+1, y), &%, y+1) and (x+1, y+ 1). 


Setup: RO x coordinate 
R1 y coordinate 


Instruction Sequence: 


INDEXD Rl, (WARP=-1), RO 3; BIT ADDRESS 
SBITD 41, ORG ; SET FIRST PIXEL 
ADDQD 1, Rl ; (X+l, Y) 

SBITD R11, ORG | ; SECOND PIXEL 
ADDD (WARP=1), Rl ; (X, Y+1) 

SBITD R11, ORG ; THIRD PIXEL 
ADDQD 1, Rl ; (X+1, Y+1) 
SBITD R1, ORG ; LAST PIXEL 


ORG ORG+1 ORG4+2 


yy 


<— ORG+ WARP 


<— ORG + 2*WARP 


@ (X,Y) 
ORG + Y*WARP + X 


TL/EE/9424-—61 


FIGURE 2-9. Correspondence between Linear and 
Cartesian Addressing 


2.4.2 BITBLT Fundamentals 


BITBLT, BIT-aligned BLock Transfer, is a general opera- 
tor that provides a mechanism to move an arbitrary size 
rectangle of an image from one part of the frame buffer 
to another. During the data transfer process a bitwise 
logical operation can be performed between the source 
and the destination data. BITBLT is also called Raster- 
Op: operations on rasters. It defines two rectangular ar- 
eas, source and destination, and performs a logical oper- 
ation (e.g., AND, OR, XOR) between these two areas and 
stores the result back to the destination. It can be ex- 
pressed in simple notation as: 


Source op Destination — Destination 
op: AND, OR, XOR, etc. 


2.0 Architectural Description (Continued) 
2.4.2.1 Frame Buffer Architecture 


There are two basic types of frame buffer architectures: 
plane-oriented or pixel-oriented. BITBLT takes advantage of 
the plane-oriented frame buffer architecture’s attribute of 
multiple, adjacent pixels-per-word, facilitating the movement 
of large blocks of data. The source and destination starting 
addresses are expressed as pixel addresses. The width and 
height of the block to be moved are expressed in terms of 
pixels and scan lines. The source block may start and end 
at any bit position of any word, and the same applies for the 
destination block. 


2.4.2.2 Bit Alignment 


Before a logical operation can be performed between the 
source and the destination data, the source data must first 
be bit aligned to the destination data. In Figure 2-10, the 
source data needs to be shifted three bits to the right in 
order to align the first pixel (i.e., the pixel at the top left 
corner) in the source data block to the first pixel in the desti- 
nation data block. 


2.4.2.3 Block Boundaries and Destination Masks 


Each BITBLT destination scan line may start and end at any 
bit position in any data word. The neighboring bits (bits shar- 
ing the same word address with any words in the destination 
data block, but not a part of the BITBLT rectangle) of the 
BITBLT destination scan line must remain unchanged after 
the BITBLT operation. | 


J WORD BOUNDARIES a 


Due to the plane-oriented frame buffer architecture, all 
memory operations must be word-aligned. In order to pre- 
serve the neighboring bits surrounding the BITBLT destina- 
tion block, both a left mask and a right mask are needed for 
all the leftmost and all the rightmost data words of the desti- 
nation block. The left mask and the right mask both remain 
the same during a BITBLT operation. 


The following example illustrates the bit alignment require- 
ments. In this example, the memory data path is 16 bits 
wide. Figure 2-10 shows a 32 pixel by 32 scan line frame 
buffer which is organized as a long bit stream which wraps 
around every two words (32 bits). The origin (top left corner) 
of the frame buffer starts from the lowest word in memory 
(word address 00 (hex)). 


Each word in the memory contains 16 bits, DO-D15. The 
least significant bit of a memory word, DO, is defined as the 
first displayed pixel in a word. In this example, BITBLT ad- 
dresses are expressed as pixel addresses relative to the 
origin of the frame buffer. The source block starting address 
is 021 (hex) (the second pixel in the third word). The desti- 
nation block starting address is 204 (hex) (the fifth pixel in 
the 33rd word). The block width is 13 (hex), and the height is 
06 (hex) (corresponding to 6 scan lines). The shift value is 3. 


PIXEL NUMBERS 
WITHIN WORDS 


0123456789ABCDEF0123456789ABCDEF 


00 


02 
04 
06 
08 
OA 
0c 
OE 
10 
12 
14 
16 
18 
1A 
1c 
1E 
20 
22 
24 
26 
28 
2A 
2C 
WORD 2E 
ADDRESSES 30 
52 
54 
36 
38 
3A 
35C 
3E 


SSSSSSSSSSSSSSSSSSSS 
SSSSSSSSSSSSSSSSSSSS 
SSSSSSSSSSSSSSSSSSSS 
SSSSSSSSSSSSSSSSSSSS 
SSSSSSSSSSSSSSSSSSSS 
SSSSSSSSSSSSSSSSSSSS 


DDDDDDDDDDDDDDDDDDDD 
DDDDDDDDDDDDDDDDDDDD 
DDDDDDDDDDDDDDDDDDDD 
DDDDDDDDDDDDDDDDDDDD 
DDDDDDDDDDDDDDDDDDDD 
DDDDDDDDDDDDDDDDDDDD 


FIGURE 2-10. 32-Pixel by 32-Scan Line Frame Buffer 
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2.0 Architectural Description (Continued) 
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(b) 


FIGURE 2-11. Overlapping BITBLT Blocks 


The left mask and the right mask are 0000,1111,1111,1111 
and 1111,1111,0000,0000 respectively. 


Note 1: Zeros in either the left mask or the right mask indicate the destina- 
tion bits which will not be modified. 


Note 2: The BB(function) and EXTBLT instructions use different set up pa- 
rameters, and techniques. 


2.4.2.2 BITBLT Directions 


A BITBLT operation moves a rectangular block of data ina 
frame buffer. The operation itself can be considered as a 
subroutine with two nested loops. The loops are preceeded 
by setup operations. In the outer loop the source and desti- 
nation starting addresses are calculated, and the test for 
completion is performed. In the inner loop the actual data 
movement for a single scan line takes place. The length of 


the inner loop is the number of (aligned) words spanned by 


each scan line. The length of the outer loop is equal to the 
height (number of scan lines) of the block to be moved. A 


skeleton of the subroutine representing the BITBLT opera- 
tion follows. 


BITBLT: calculate BITBLT setup parameters; 


(once per BITBLT operation). 
such as 

width, height 

bit misalignment (shift number) 
left, right masks 

horizontal, vertical directions 


etc 
@ 


OUTERLOOP: calculate source, dest addresses; 


(once per scanline). 

move data, (logical operation) and incre- 
ment addresses; 3 
(once per word). 


INNERLOOP: . 
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UNTIL done horizontally 
UNTIL done vertically 
RETURN (from BITBLT). 


Note: In the NS32CG16 only the setup operations must be done by the 
programmer. The inner and outer loops are automatically executed 
by the BITBLT instructions. 

Each loop can be executed in one of two directions: the 

inner loop from left to right or right to left, the outer loop 

from top to bottom (down) or bottom to top (up). 


The ability to move data starting from any corner of the 
BITBLT rectangle is necessary to avoid destroying the 
BITBLT source data as a result of destination writes when 
the source and destination are overlapped (i.e., when they 
share pixels). This situation is routinely encountered while 
panning or scrolling. 


A determination of the correct execution directions of the 
BITBLT must be performed whenever the source and desti- 
nation rectangles overlap. Any overlap will result in the de- 
struction of source data (from a destination write) if the cor- 
rect vertical direction is not used. Horizontal BITBLT direc- 
tion is of concern only in certain cases of overlap, as will be 
explained below. 


Figures 2-11(a) and (b) illustrate two cases of overlap. Here, 
the BITBLT rectangles are three pixels wide by five scan 
lines high; they overlap by a single pixel in (a) and a single 
column of pixels in (6). For purposes of illustration, the 
BITBLT is assumed to be carried out pixel-by-pixel. This 
convention does not affect the conclusions. 


In Figure 2-11(a), if the BITBLT is performed in the UP direc- 
tion (bottom-to-top) one of the transfers of the bottom scan 
line of the source will write to the circled pixel of the destina- 
tion. Due to the overlap, this pixel is also part of the upper- 
most scan line of the source rectangle. Thus, data needed 
later is destroyed. Therefore, this BITBLT must be per- 
formed in the DOWN direction. Another example of this oc- 


2.0 Architectural Description (Continued) 


curs any time the screen is moved in a purely vertical direc- 
tion, as in scrolling text. It should be noted that, in both of 
these cases, the choice of horizontal BITBLT direction may 
be made arbitrarily. 


Figure 2-11(b) demonstrates a case in which the horizontal 
BITBLT direction may not be chosen arbitrarily. This is an 
instance of purely horizontal movement of data (panning). 
Because the movement from source to destination involves 
data within the same scan line, the incorrect direction of 
movement will overwrite data which will be needed later. In 
this example, the correct direction is from right to left. 


2.4.2.5 BITBLT Variations 


The ‘classical’ definition of BITBLT, as described in ‘‘Small- 
talk-80 The Language and its Implementation’, by Adele 
Goldberg and David Robson, provides for three operands: 
source, destination and mask/texture. This third operand is 
commonly used in monochrome systems to incorporate a 
stipple pattern into an area. These stipple patterns provide 
the appearance of multiple shades of gray in single-bit-per- 
pixel systems, in a manner similar to the ‘halftone’ process 
used in printing. 
Texture op1 Source op2 Destination — Destination 


While the NS32CG16 and the external BPU (if used) are 
essentially two-operand devices, three-operand BITBLT op- 
erations can be implemented quite flexibly and efficiently by 
performing the two operations serially. 


2.4.3 GRAPHICS SUPPORT INSTRUCTIONS 


The NS32CG16 provides eleven instructions for supporting 
graphics oriented applications. These instructions are divid- 
ed into three groups according to the operations they per- 
form. General descriptions for each of them and the related 
formats are provided in the following sections. 


2.4.3.1 BITBLT (BIT-aligned BLock Transfer) 


This group includes seven instructions. They are used to 
move characters and objects into the frame buffer which will 
be printed or displayed. One of the instructions works in 
conjunction with an external BITBLT Processing Unit (BPU) 
to maximize performance. The other six are executed by the 
NS32CG16. 


BIT-aligned BLock Transfer 
Syntax: BB(function) Options 


Setup: RO base address, source data 
R1 base address, destination data 
R2 shift value 
R3 height (in lines) 
R4 first mask 
R5 second mask 
R6 source warp (adjusted) 
R7 destination warp (adjusted) 


O(SP) width (in words) 
Function: AND, OR, XOR, FOR, STOD 
Options: IA Increasing Address (default option). 


When IA is selected, scan lines are 
transferred in the increasing BIT/BYTE 
order. 


DA Decreasing Address. 
S True Source (default option). 
Inverted Source. 
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These five instructions perform standard BITBLT operations 
between source and destination blocks. The operations 
available include the following: 


BBAND: src AND _ dst 
—src AND dst 
BBOR: src OR dst 
—src OR dst 
BBXOR: src XOR _ dst 
—src XOR _ dst 
BBFOR: src OR dst 
BBSTOD: src TO dst 
—sre TO dst 


‘src’ and ‘—sre’ stand for ‘True Source’ and ‘Inverted 

Source’ respectively; ‘dst’ stands for ‘Destination’. 

Note 1: For speed reasons, the BB instructions require the masks to be 
specified with respect to the source block. In Figure 2-70 masking 
was defined relative to the destination block. 

Note 2: The options —S and DA are not available for the BBFOR instruc- 
tion. 

Note 3: BBFOR performs the same operation as BBOR with IA and S op- 
tions. 

Note 4: IA and DA are mutually exclusive and so are S and —S. 

Note 5: The width is defined as the number of words of source data to read. 

Note 6: An odd number of bytes can be specified for the source warp. 
However, word alignment of source scan lines will result in faster 
execution. 

The horizontal and vertical directions of the BITBLT opera- 

tions performed by the above instructions, with the excep- 

tion of BBFOR, are both programmable. The horizontal di- 

rection is controlled by the IA and DA options. The vertical 

direction is controlled by the sign of the source and destina- 
tion warps. Figure 2-12 and Table 2-3 show the format of 

the BB instructions and the encodings for the ‘op’ and ‘i 

¢ D is set when the DA option is selected 


fields. 
7 0 
00001110 
e S is set when the —S option is selected 


e X is set for BBAND, and it is clear for all other BB instructions 
FIGURE 2-12. BB Instructions Format 
TABLE 2-3. ‘op’ and ‘’ Field Encodings 
[instruction | Options | ‘op’ Field | ‘Field | 
BBAND | Yes | 1010 | 11 
| No 


23 16/15 8 


000000DXS0| op |i 


BBFOR 
BBSTOD 


BIT-aligned Word Transfer 
Syntax: BITWT 


Setup: RO Base address, source word 
R1 Base address, destination double word 
R2 Shift value 
The BITWT instruction performs a fast logical OR operation 
between a source word and a destination double word, 
stores the result into the destination double word and incre- 
ments registers RO and R1 by two. Before performing the 
OR operation, the source word is shifted left (i.e., in the 
direction of increasing bit numbers) by the value in register 
R2. 
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2.0 Architectural Description (Continued) 


This instruction can be used within the inner loop of a block 
OR operation. Its use assumes that the source data is 
‘clean’ and does not need masking. The BITWT format is 
shown in Figure 2-13. 


FIGURE 2-13. BITWT Instruction Format 


External BITBLT 
Syntax: EXTBLT 


Setup: RO base addresses, source data 
R1 base address, destination data 
R2 width (in bytes) 
R3 height (in lines) 
R4 horizontal increment/decrement 
R5 temporary register (current width) 
R6 source warp (adjusted) 
R7 destination warp (adjusted) 


Note 1: RO and Ri are updated after execution to point to the last source 
and destination addresses plus related warps. R2, R3 and R65 will 
be modified. R4, R6, and R7 are returned unchanged. 


Note 2: Source and destination pointers should point to word-aligned oper- 

ands to maximize speed and minimize external interface logic. 
This instruction performs an entire BITBLT operation in con- 
junction with an external BITBLT Processing Unit (BPU). 
The external BPU Control Register should be loaded by the 
software before the instruction is executed (refer to the 
DP8510 or DP8511 data sheets for more information on the 
BPU). The NS32CG16 generates a series of source read, 
destination read and destination write bus cycles until the 
entire data block has been transferred. The BITBLT opera- 
tion can be performed in either horizontal direction. As con- 
trolled by the sign of the contents of register R4. 


Depending on the relative alignment of the source and des- 
tination blocks, an extra source read may be required at the 
beginning of each scan line, to load the pipeline register in 
the external BPU. The L bit in the PSR register determines 
whether the extra source read is performed. If L is 1, no 
extra read is performed. The instructions CMPQB 2,1 or 
CMPQB 1,2 could be executed to provide the right setting 
for the L bit just before executing EXTBLT. Figure 2-14 
shows the EXTBLT format. The bus activity for a simple 
BITBLT operation is shown in Figure 2-79. 


FIGURE 2-14. EXTBLT Instruction Format 


B.3.2 Pattern Fill 


Only one instruction is in this group. It is usually used for 
clearing RAM and drawing patterns and lines. 


Move Multiple Pattern 
Syntax: MOVMPi 


Setup: RO base address of the destination 
R1 pointer increment (in bytes) 
R2 number of pattern moves 
R3 source pattern 
Note: R1 and R3 are not modified by the instruction. R2 will always be 


returned as zero. RO is modified to reflect the last address into which 
a pattern was written. 


! 1 
23 16/15 8|7 0 
000000000010000100001110 


000000000001011100001110 
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This instruction stores the pattern in register R3 into the 
destination area whose address is in register RO. The pat- 
tern count is specified in register R2. After each store oper- 
ation the destination address is changed by the contents of 
register R1. This allows the pattern to be stored in rows, in 


ee 
columns, and in any direction, depending on the value and 


sign of R1. The MOVMPi instruction format is shown in Fig- 
ure 2-15. 


23 15 8|7 0 
00000000000111) 1 |00001110 


FIGURE 2-15. MOVMPi Instruction Format 


B.3.3 Data Compression, Expansion and Magnify 


The three instructions in this group can be used to com- 
press data and restore data from compression. A com- 
pressed character set may require from 30% to 50% less 
memory space for its storage. 


The compression ratio possible can be 50:1 or higher de- 
pending on the data and algorithm used. TBITS can also be 
used to find boundaries of an object. As a character is need- 
ed, the data is expanded and stored in a RAM buffer. The 
expand instructions (SBITS, SBITPS) can also function as 
line drawing instructions. 


Test Bit String 
Syntax: TBITS option 
Setup: RO base address, source (byte address) 


R1 starting source bit offset 

R2 destination run length limited code 
R3 maximum value run length limit 
R4 maximum source bit offset 


Option: 1 count set bits until a clear bit is found 

0 count clear bits until a set bit is found 
Note: RO, R3 and R4 are not modified by the instruction execution. R1 

_ reflects the new bit offset. R2 holds the result. 

This instruction starts at the base address, adds a bit offset, 
and tests the bit for clear if “option” = O (and for set if 
“option” = 1). If clear (or set), the instruction increments to 
the next higher bit and tests for clear (or set). This testing 
for clear proceeds through memory until a set bit is found or 
until the maximum source bit offset or maximum run length 
value is reached. The total number of clear bits is stored in 
the destination as a run length value. 


When TBITS finds a set bit and terminates, the bit offset is 
adjusted to reflect the current bit address. Offset is then 
ready for the next TBITS instruction with “option” = 0. After 
the instruction is executed, the F flag is set to the value of 
the bit previous to the bit currently being pointed to (i.e., the 
value of the bit on which the instruction completed execu- 
tion). In the case of a starting bit offset exceeding the maxi- 
mum bit offset (R1 = R4), the F flag is set if the option was 
1 and clear if the option was 0. The L flag is set when the 
desired bit is found, or if the run length equalled the maxi- 
mum run length value and the bit was not found. It is cleared 
otherwise. Figure 2-16 shows the TBITS instruction format. 


000000008S01001110000111 0) | 


© S is set for ‘TBITS 1’ and clear for ‘TBITS 0’. 


FIGURE 2-16. TBITS Instruction Format 


2.0 Architectural Description (Continued) 


Set Bit String 
Syntax: SBITS 


Setup: RO base address of the destination 

R1 starting bit offset (signed) 

R2 number of bits to set (unsigned) 

R3 address of string look-up table 
Note: When the instruction terminates, the registers are returned un- 

changed. 

SBITS sets a number of contiguous bits in memory to 1, and 
is typically used for data expansion operations. The instruc- 
tion draws the number of ones specified by the value in R2, 
starting at the bit address provided by registers RO and R1. 
In order to maximize speed and allow drawing of patterned 
lines, an external 1k byte lookup table is used. The lookup 
table is specified in the NS32CG16 Printer/Display Proces- 
sor Programmer’s Reference Supplement. 


When SBITS begins executing, it compares the value in R2 
with 25. If the value in R2 is less than or equal to 25, the F 
flag is cleared and the appropriate number of bits are set in 
memory. If R2 is greater than 25, the F flag is set and no 
other action is performed. This allows the software to use a 
faster algorithm to set longer strings of bits. Figure 2-17 
shows the SBITS instruction format. 


FIGURE 2-17. SBITS Instruction Format 


READ SOURCE 
READ DESTINATION 


WRITE RESULT 
TO DESTINATION 


READ SOURCE 


000000000011011100001110 


READ DESTINATION 


WRITE RESULT 
TO DESTINATION 


Set BIT Perpendicular String 
Syntax: SBITPS 


Setup: RO base address, destination (byte address) 

R1 Starting bit offset 

R2 number of bits to set 

R3 destination warp (signed value, in bits) 
Note: When the instruction terminates, the RO and R3 registers are re- 

turned unchanged. R1 becomes the final bit offset. R2 is zero. 

The SBITPS can be used to set a string of bits in any direc- 
tion. This allows a font to be expanded with a 90 or 270 
degree rotation, as may be required in a printer application. 
SBITPS sets a string of bits starting at the bit address speci- 
fied in registers RO and R1. The number of bits in the string 
is specified in R2. After the first bit is set, the destination 
warp is added to the bit address and the next bit is set. The 
process is repeated until all the bits have been set. A nega- 
tive raster warp offset value leads to a 90 degree rotation. A 
positive raster warp value leads to a 270 degree rotation. If 
the R3 value is = (space warp +1 or —1), then the result is 
a 45 degree line. If the R3 value is +1 or —1, a horizontal 
line results. 


SBITS and SBITPS allow expansion on any 90 degree an- 
gle, giving portrait, landscape and mirror images from one 
font. Figure 2-18 shows the SBITPS instruction format. 


FIGURE 2-18. SBITPS Instruction Format 


READ SOURCE 
READ DESTINATION 


WRITE RESULT 
TO DESTINATION 


READ SOURCE 
READ DESTINATION 


WRITE RESULT 
TO DESTINATION 


123541234123 41123412341234/1123412341234123412341234 


i *] 
Cc 


WORD 1 (12 CLOCKS) 


= 
a 


r/ 
J 


WORD 2 (12 CLOCKS) 


Si 
PUM un uur ane 


WORD 3 (12 CLOCKS) WORD 4 (12 CLOCKS) 


TL/EE/9424—66 


FIGURE 2-19. Bus Activity for a Simple BITBLT Operation 


Note 1: This example is for a block 4 words wide and 1 line high. 


Note 2: The sequence is common with all logical operations of the DP8510/DP8511 BPU. 
Note 3: Mask values, shift values and number of bit planes do not affect the performance. 


Note 4: Zero wait states are assumed throughout the BITBLT operation. 


Note 5: The extra read is performed when the BPU pipeline register needs to be preloaded. 


000000000010111100001110 
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2.0 Architectural Description (continued) 


B.3.3.1 Magnifying Compressed Data 


Restoring data is just one application of the SBITS and 
SBITPS instructions. Multiplying the “length’’ operand used 
by the SBITS and SBITPS instructions causes the resulting 


H H ce 39 
pattern tc be wider, or a multiple of “length”. 


As the pattern of data is expanded, it can be magnified by 
2x, 3x, 4x, ... , 10x and so on. This creates several sizes of 
the same style of character, or changes the size of a logo. A 
magnify in both dimensions X and Y can be accomplished 
by drawing a single line, then using the MOVS (Move String) 
or the BB instructions to duplicate the line, maintaining an 
equal aspect ratio. 


More information on this subject is provided in the 
NS32CG16 Printer/Display Processor Programmer’s Refer- 
ence Supplement. 


3.0 Functional Description 


3.1 POWER AND GROUNDING 


The NS32CG16 requires a single 5-Volt power supply, ap- 
plied on 5 pins. The logic voltage pin (Vcc_) supplies the 
power to the on-chip logic. The buffer voltage pins 
VCCCTTL, VCCFCLK, VCCAD, and VCCIO supply the pow- 
er to the on-chip output drivers. 


Grounding connections are made on 6 pins. The Logic 
Ground Pin (VSSL) provides the ground connection to the 
on-chip logic. The buffer ground pins VSSFCLK, VSSNTSO, 
VSSHAD, VSSLAD, VSSIO are the ground pins for the on- 
chip output drivers. | 


For optimal noise immunity, the power and ground pins 
should be connected to Vcc and ground planes respective- 
ly. If Voc and ground planes are not used, single conductors 
should be run directly from each Vcc pin to a power point, 
and from each GND pin to a ground point. Daisy-chained 
connections should be avoided. 


Decoupling capacitors should also be used to keep the 
noise level to a minimum. Standard 0.1 wF ceramic capaci- 
tors can be used for this purpose. In addition, a 1.0 pF 
tantalum capacitor should be connected between Vcc, and 
ground. They should attach to Vcc, Vss pairs as close as 
possible to the NS32CG16. 


During prototype using wire-wrap or similar methods, the 
Capacitors should be soldered directly to the power pins of 
the NS32CG16 socket, or as close as possible, with very 
short leads. 


Recommended bypass for production in printed circuit 
boards: 


+5 Ground Capacitors 

VCCL VSSL 0.1 uF Disk Ceramic 
1.0 wF Tantulum 

VCCIO VSSIO | 0.1 uF 
VCCCTTL VSSNTSO 0.1 pF 
VCCAD VSSLAD 0.1 pF 
VCCAD VSSHAD None 
VCCFCLK = VSSFCLK 0.1 uF 


VCCL-VSSL bypass requires a very short lead length and 
low inductance on the 0.1 wF capacitor. 


Design Notes 


When constructing a board using high frequency clocks with 
multiple lines switching, special care should be taken to 


avoid resonances on signal lines. A separate power and 
ground layer is recommended. This is true when designing 
boards for the NS32CG16. Switching times of under 5 ns on 
some lines are possible. Resonant frequencies should be 
maintained well above the 200 MHz frequency range on 
signal paths by keeping traces short and inductance low. 
Loading capacitance at the end of a transmission line con- 
tributes to the resonant frequency and should be minimized 
if possible. Capacitors should be located as close as possi- 
ble across each power and ground pair near the 
NS32CG16. 


Power and ground connections are shown in Figure 3-7. 
+5V 


VCCL 


| OTHER VCC 

VCccTIL, CONNECTIONS 

VCCFCLK, (VCC PLANE) 
VCCAD, 


vcclo 


NS32CG16 
CPU. 


VSSL 
OTHER GROUND 


VSSFCLK, | CONNECTIONS 
VSSNTSO; | (GND PLANE) 
VSSHAD, 


VSSLAD, 

YSSIO_ 
TL/EE/9424-7 | 

FIGURE 3-1. Power and Ground Connections 


3.2 CLOCKING 


The NS32CG16 provides an internal oscillator that interacts 
with an external clock source through two signals; OSCIN 
and OSCOUT. 


Either an external single-phase clock signal or a crystal can 
be used as the clock source. If a single-phase clock source 
is used, only the connection on OSCIN is required; 
OSCOUT should be left open. The voltage level require- 
ments specified in Section 4.3 must also be met for proper 
operation. 


When operation with a crystal is desired, a fundamental 
mode crystal should be used. In this case, special care 
should be taken to minimize stray capacitances and induc- 
tances, especially when operating at a crystal frequency of 
30 MHz. The crystal, as well as the external RC compo- 
nents, should be placed in close proximity to the OSCIN and 
OSCOUT pins to keep the printed circuit trace lengths to an 
absolute minimum. Figure 3-2 shows the external crystal 
interconnections. Table 3-1 provides the crystal characteris- 
tics and the values of the RC components required for vari- 
ous frequencies. 


OSCIN 


OSCOUT 


TL/EE/9424-8 | 
FIGURE 3-2. Crystal Interconnections 


3.0 Functional Description (Continued) 


TABLE 3-1. External Oscillator Specifications 
Crystal Characteristics 


TV DC rt Oicaee tte eRe aeweri os eed tae h ewes At-Cut 
TOlClENCS 345s ac daw asa tdd kanes ene 0.005% at 25°C 
StQ ON sand eevee cast esgesubs os 0.01% from 0°C to 70°C 
Resonance .............0 cece eee Fundamental (parallel) 
Capacitance 23a ..cwiews id aien Gaba aeaeeetaeees 20 pF 
Maximum Series Resistance............. cece cence 502 


RC — Values 


470 


360 
270 
220 
180 


3.2.1 Power Save Mode 


The NS382CG16 provides a power save feature that can be 
used to significantly reduce the power consumption at times 
when the computational demand decreases. The device 
uses the clock signal at the OSCIN pin to derive the internal 
clock as well as the external signals PHI1, PHI2, CTTL and 
FCLK. The frequency of all these clock signals is affected 
by the clock scaling factor. Scaling factors of 1, 2, 4 or 8 can 
be selected by properly setting the C and M bits in the CFG 
register. 

Upon reset, both C and M are set to zero, thus maximum 
clock rate is selected. 

Due to the fact that the C and M bits are programmed by the 
SETCFG instruction, the power save feature can only be 
controlled by programs running in supervisor mode. . 
The following table shows the C and M bit settings for the 
various scaling factors, and the resulting supply current for a 
crystal frequency of 30 MHz. 


Clock Scaling Factor vs Supply Current 


Scaling CPU Clock Typical icc 
Factor Frequency at + 5V 


15 MHz 
7.5 MHz 
3.75 MHz 
1.88 MHz 


3.3 RESETTING 


The RSTI input pin is used to reset the NS32CG16. The 
CPU samples RSTI on the falling edge of CTTL. 


Whenever a low level is detected, the CPU responds imme- 
diately. Any instruction being executed is terminated; any 
results that have not yet been written to memory are dis- 
carded; and any pending interrupts and traps are eliminated. 
The internal latch for the edge-sensitive NMI signal is 
cleared. 


On application of power, RST! must be held low for at least 
50 us after Vcc is stable. This is to ensure that all on-chip 
voltages are completely stable before operation. Whenever 
a Reset is applied, it must also remain active for not less 
than 64 CTTL cycles. See Figures 3-3 and 3-4. 
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While in the Reset state, the CPU drives the signals ADS, 
RD, WR, DBE, TSO, BPU, and DDIN inactive. ADO—AD15, 
A16-A23 and SPC are floated, and the state of all other 


output signals is undefined. 


The internal CPU clock, PHI1, PHI2 and CTTL all run at half 
the frequency of the signal on the OSCIN pin. FCLK runs at 
the same frequency of OSCIN. 


The HOLD signal must be kept inactive. After the RSTI sig- 
nal is driven high, the CPU will stay in the reset condition for 
approximately 8 clock cycles and then it will begin execution 
at address 0. 


The PSR is reset to 0. The CFG C and M bits are reset to 0. 
NMI is enabled to allow Non-Maskable Interrupts. The fol- 
lowing conditions are present after reset due to the PSR 
being reset to 0: 


Tracing is disabled. 

Supervisor mode is enabled. 

Supervisor stack space is used when the TOS addressing 
mode is indicated. 

No trace traps are pending. 

Only NMI is enabled. INT is not enabled. 

BPU is inactive high. 

The Clock Scaling Factor is set to 1, refer to Section 3.2.1. 


Note that vector/non-vectored interrupts have not been se- 
lected. While interrupts are disabled, a SETCFG [I] instruc- 
tion must be executed to declare the presence of the 
NS32202 if vectored interrupts are desired. If non-vectored 
interrupts are required, a SETCFG without the [I] must be 
executed. 


The presence/absence of the NS32081 or NS32381 has 
also not been declared. If there is a Floating Point Unit, a 
SETCFG [F] instruction must be executed. If there is no 
floating point unit, a SETCFG without the [F] must be exe- 
cuted. 


In general, a SETCFG instruction must be executed in the 
reset routine, in order to properly configure the CPU. The 
options should be combined, and executed in a single in- 
struction. For example, to declare vectored interrupts, a 
Floating Point unit installed, and full CPU clock rate, execute 
a SETCFG [F, |] instruction. To declare non-vectored inter- 
rupts, no FPU, and full CPU clock rate, execute a 
SETCFG [ ] instruction. 


ve 


nn 


= 64 CLOCK 
CYCLES 


= 50 us 
TL/EE/9424-9 
FIGURE 3-3. Power-On Reset Requirements 


= 64 CLOCK 
CYCLES 
al 


TL/EE/9424-10 
FIGURE 3-4. General Reset Timing 
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3.0 Functional Description (continued) 


3.4 BUS CYCLES 


The CPU will perform a bus cycle for one of the following 
reasons: 


1) To write or read data, to or from memory or peripheral 
devices. Peripheral input and output are memory- 
mapped in the Series 32000 family. 

2) To fetch instructions into the eight-byte instruction 
queue. This happens whenever the bus would otherwise 
be idle and the queue is not already full. 

3) To acknowledge an interrupt and allow external circuitry 

_to provide a vector number, or to acknowledge comple- 
tion of an interrupt service routine. 

4) To transfer information to or from a Slave Processor. 


In terms of bus timing, cases 1 through 3 above are identi- 
cal. For timing specifications, see Section 4. The only exter- 


nal difference between them is the four-bit code placed on — 


the Bus Status pins (STO-—ST3). Slave Processor cycles dif- 
fer in that separate control signals are applied (Section 
3.4.7). 


3.4.1 Bus Status 

The NS32CG16 CPU presents four bits of Bus Status infor- 
mation on pins STO-ST3. The various combinations on 
these pins indicate why the CPU is performing a bus cycle, 
or, if it is idle on the bus, then why it is idle. 

The Bus Status pins are interpreted as a four-bit value, with 
STO the least significant bit. Their values decode as follows: 


0000 — The bus is idle because the CPU does not need 
to perform a bus access. 

0001-— The bus is idle because the CPU is executing 
the WAIT instruction. 

0010— (Reserved for future use.) 

0011— The bus is idle because the CPU is waiting for a 
Slave Processor to complete an instruction. 

0100 — Interrupt Acknowledge, Master. 
The CPU is performing a Read cycle to ac- 
knowledge an interrupt request. See Section 
3.4.6. 

0101— Interrupt Acknowledge, Cascaded. 
The CPU is reading an interrupt vector to ac- 
knowledge a maskable interrupt request from a 
Cascaded Interrupt Control Unit. 

0110— End of Interrupt, Master. 
The CPU is performing a Read cycle to indicate 
that it is executing a Return from Interrupt 
(RET) instruction at the completion of an inter- 
rupt’s service procedure. 

0111— End of Interrupt, Cascaded. 
The CPU is performing a read cycle from a Cas- 
caded Interrupt Control Unit to indicate that it is 
executing a Return from Interrupt (RETI) in- 
struction at the completion of an interrupt’s 
service procedure. 

1000 — Sequential Instruction Fetch. 


The CPU is reading the next sequential word 
from the instruction stream into the Instruction 
Queue. It will do so whenever the bus would 
otherwise be idle and the queue is not already 
full. 


26 


1001 — Non-Sequential Instruction Fetch. 


The CPU is performing the first fetch of instruc- _ 
tion code after the Instruction Queue is purged. 
This will occur as a result of any jump or branch, 
any interrupt or trap, or execution of certain in- 
structions. 

Data Transfer. 


The CPU is reading or writing an operand of an 
instruction. 


Read RMW Operand. 


The CPU is reading an operand which will sub- 
sequently be modified and rewritten. The write 
cycle of RMW will have a “write” status. 


Read for Effective Address Calculation. 


The CPU is reading information from memory in 
order to determine the Effective Address of an 
operand. This will occur whenever an instruc- 
tion uses the Memory Relative or External ad- 
dressing mode. 


Transfer Slave Processor Operand. 


The CPU is either transferring an instruction op- 
erand to or from a Slave Processor, or it is issu- 
ing the Operation Word of a Slave Processor 
instruction. See Section 3.9.1. 


Read Slave Processor Status. 


The CPU is reading a Status Word from a Slave 
Processor after the Slave Processor has sig- 
nalled completion of an instruction. 


Broadcast Slave ID. 


The CPU is initiating the execution of a Slave 
Processor instruction by transferring the first 
byte of the instruction, which represents the 
slave processor indentification. 


3.4.2 Basic Read and Write Cycles 


The sequence of events occurring during a CPU access to 
either memory or peripheral device is shown in Figure 3-6 
for a read cycle, and Figure 3-7 for a write cycle. 


The cases shown assume that the selected memory or pe- 
ripheral device is capable of communicating with the CPU at 
full speed. If not, then cycle extension may be requested 
through CWAIT and/or WAIT1-2. 


A full-speed bus cycle is performed in four cycles of the 
CTTL clock signal, labeled T1 through T4. Clock cycles not 
associated with a bus cycle are designated Ti (for “‘Idle’’). 


During T1, the CPU applies an address on pins ADO-AD15 
and A16—A23. It also provides a low-going pulse on the 
ADS pin, which serves the dual purpose of informing exter- 
nal circuitry that a bus cycle is starting and of providing con- 
trol to an external latch for demultiplexing Address bits 0- 
15 from the ADO-AD15 pins. See Figure 3-5. During this 
time also the status signals DDIN, indicating the direction of 
the transfer, and HBE, indicating whether the high byte 
(AD8—AD15) is to be referenced, become valid. 


During T2 the CPU switches the Data Bus, ADO—AD15, to 
either accept or present data. Note that the signals A16- 
A23 remain valid, and need not be latched. 


1010 — 


1011 — 


1100 — 


1101 — 


1110 — 


1141— 


3.0 Functional Description (Continued) 


DOIN 


ADO-AD15 BUFFER 


NS32CG16 


FIGURE 3-5. Bus Connections 


27 


HBE 


AO(LBE) 


TL/EE/9424-11 
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3.0 Functional Description (Continued) 


T4 OR Ti 11 


CTTL 
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FIGURE 3-6. Read Cycie Timing 
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3.0 Functional Description (Continued) 
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FIGURE 3-7. Write Cycle Timing | 
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3.0 Functional Description (Continued) 

At this time the signals TSO (Timing State Output), DBE 
(Data Buffer Enable) and either RD (Read Strobe) or WR 
(Write Strobe) will also be activated. 

The T3 state provides for access time requirements, and it 
occurs at least once in a bus cycle. At the end of T2, on the 


Pat ad | rT ... 6 CAA I" «4 ~~ 


rising edge of CTTL, the CWAIT and WAIT i-2 signais are 
sampled to determine whether the bus cycle will be extend- 
ed. See Section 3.4.3. 


If the CPU is performing a read cycle, the data bus 
(ADO-—AD15) is sampled at the beginning of T4 on the rising 
edge of CTTL. Data must, however, be held a little longer to 
meet the data hold time requirements. The RD signal is 
guaranteed not to go inactive before this time, so its rising 
edge can be safely used to disable the device providing the 
input data. 


The T4 state finishes the bus cycle. At the beginning of T4, 
the RD or WR, and TSO signals go inactive, and on the 
falling edge of CTTL, DBE goes inactive, having provided for 
necessary data hold times. Data during Write cycles re- 
mains valid from the CPU throughout T4. Note that the Bus 
Status lines (STO—ST3) change at the beginning of T4, an- 
ticipating the following bus cycle (if any). 


3.4.3 Cycle Extension 


To allow sufficient access time for any speed of memory or 
peripheral device, the NS32CG16 provides for extension of 
a bus cycle. Any type of bus cycle except a Slave Processor 
cycle can be extended. 


In Figures 3-6 and 3-7, note that during T3 all bus control 
signals from the CPU are flat. Therefore, a bus cycle can be 
cleanly extended by causing the T3 state to be repeated. 
This is the purpose of the WAIT1-—2 and CWAIT input sig- 
nals. 


At the end of state T2, on the rising edge of CTTL, WAIT1— 
2 and CWAIT are sampled. 


If any of these signals are active, the bus cycle will be ex- 
tended by at least one clock cycle. Thus, one or more addi- 
tional T3 state (also called wait state) will be inserted after 
the next T-State. Any combination of the above signals can 
be activated at one time. However, the WAIT1—2 inputs are 
only sampled by the CPU at the end of state T2. They are 
ignored at all other times. 


The WAIT1-2 inputs are binary weighted, and can be used 
to insert up to 3 wait states, according to the following table. 


CWAIT causes wait states to be inserted continuously as 
long as it is sampled active. It is normally used when the 
number of wait states to be inserted in the CPU bus cycle is 
not known in advance. 


The following sequence shows the CPU response to the 
WAIT1-2 and CWAIT inputs. 


1. Start bus cycle. 
2. Sample WAIT1-—2 and CWAIT at the end of state T2. 


3. If the WAIT1-—2 inputs are both inactive, then go to step 
6. 
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4. Insert the number of wait states selected by WAIT1-2. 
5. Sample CWAIT again. 

6. If CWAIT is not active, then go to step 8. 

7. Insert one wait state and then go to step 5. 


a. Complete b hire rurla 


uw wy WIS s 


Figure 3-8 shows a bus cycle extended by three wait states, 


_ two of which are due to WAIT2, and one is due to CWAIT. 


3.4.4 Data Access Sequences 


The 24-bit address provided by the NS32CG16 is a byte 
address; that is, it uniquely identifies one of up to 
16,777,216 eight-bit memory locations. An important feature 
of the NS32CG16 is that the presence of a 16-bit data bus 
imposes no restrictions on data alignment; any data item, 
regardless of size, may be placed starting at any memory 
address. The NS32CG16 provides a special control signal, 
High Byte Enable (HBE), which facilitates individual byte ad- 
dressing on a 16-bit bus. 


Memory is organized as two eight-bit banks, each bank re- 
ceiving the word address (A1—A23) in parallel. One bank, 
connected to Data Bus pins ADO-AD7, is enabled to re- 
spond to even byte addresses; i.e., when the least signifi- 
cant address bit (AO) is low. The other bank, connected to 
Data Bus pins AD8—AD15, is enabled when HBE is low. See 
Figure 3-9. 


HBE A0(LBE) 


8 BITS 8 ats 


16 BITS DATA | 


TL/EE/9424~15 
FIGURE 3-9. Memory Interface 

Any bus cycle falls into one of three categories: Even Byte 

Access, Odd Byte Access, and Even Word Access. All ac- 

cesses to any data type are made up of sequences of these 

cycles. Table 3-2 gives the state of AO and HBE for each 

category. 


TABLE 3-2. Bus Cycle Categories 


| Category | BE | AO 


Even Byte 1 0 
0 1 
0 0 | 


Odd Byte 
Even Word 


3.0 Functional Description (Continued) 
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3.0 Functional Description (continued) 


Accesses of operands requiring more than one bus cycle 
are performed sequentially, with no idle T-States separating 
them. The number of bus cycles required to transfer an op- 
erand depends on its size and its alignment (i.e., whether it 
starts on an even byte address or an odd byte address). 


m OO Latn them L.. ae ee ee eee 


Tabié 3-3 lists the bus cycie performed for each situation. 


For the timing of AO and HBE, see Section 3.4.2. 


3.4.4.1 Bit Accesses 


The Bit Instructions perform byte accesses to the byte con- 
taining the designated bit. The Test and Set Bit instruction 
(SBIT), for example, reads a byte, alters it, and rewrites it, 
having changed the contents of one bit. 


3.4.4.2 Bit Field Accesses 


An access to a Bit Field in memory always generates a Dou- 
ble-Word transfer at the address containing the least signifi- 
cant bit of the field. The Double Word is read by an Extract 
instruction; an Insert instruction reads a Double Word, modi- 
fies it, and rewrites it. 


3.4.4.3 Extending Multiply Accesses 


The Multiply Extended Integer (MEI) instruction will return a 
result which is twice the size in bytes of the operand it 
reads. If the multiplicand is in memory, the most-significant 
half of the result is written first (at the higher address), then 
the least-significant half. 


3.4.5 Instruction Fetches 

Instructions for the NS32CG16 CPU are ‘‘prefetched”’; that 
is, they are input before being needed into the next available 
entry of the eight-byte Instruction Queue. The CPU performs 
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two types of Instruction Fetch cycles: Sequential and Non- 
Sequential. These can be distinguished from each other by 
their differing status combinations on pins STO-ST3 (Sec- 
tion 3.4.1). 


A Sequential Fetch will be performed by the CPU whenever 
the Data Bus would otherwise be idle and the Instruction 
Queue is not currently full. Sequential Fetches are always 
Even Word Read cycles (Table 3-2). 


A Non-Sequential Fetch occurs as a result of any break in 
the normally sequential flow of a program. Any jump or 
branch instruction, a trap or an interrupt will cause the next 
Instruction Fetch cycle to be Non-Sequential. In addition, 
certain instructions flush the instruction queue, causing the 
next instruction fetch to display Non-Sequential status. Only 
the first bus cycle after a break displays Non-Sequential 
status, and that cycle is either an Even Word Read or an 
Odd Byte Read, depending on whether the destination ad- 
dress is even or odd. 


3.4.6 Interrupt Control Cycles 


Activating the INT or NMI pin on the CPU will initiate one or 
more bus cycles whose purpose is interrupt control rather 
than the transfer of instructions or data. Execution of the 
Return from Interrupt instruction (RET!) will also cause Inter- 
rupt Control bus cycles. These differ from instruction or data 
transfers only in the status presented on pins STO-STS3. All 
Interrupt Control cycles are single-byte Read cycles. 


Table 3-4 shows the Interrupt Control sequences associat- 
ed with each interrupt and with the return from its service 
routine. For full details of the NS32CG16 interrupt structure, 
see Section 3.8. 


3.0 Functional Description (continued) 


Cycle 


TABLE 3-3. Access Sequences 


Type Address BE AO High Bus 


A. Odd Word Access Sequence 


Low Bus 


BYTE 1 BYTE 0 


Odd Byte A 0 
Even Byte A+1 1 0 


— 


Byte 0 


B. Even Double-Word Access Sequence 


Don’t Care 


Don’t Care 
Byte 1 


BYTE 3 BYTE 2 BYTE 1 BYTE 0 


Even Word — A Byte 1 
Even Word A+2 ; : Byte 3 


C. Odd Double-Word Access Sequence 


Byte 0 
Byte 2 


BYTE 3 BYTE 2 BYTE 1 BYTE 0 


Odd Byte A 0 1 Byte 0 
Even Word A+ 1 0 0 Byte 2 
Even Byte A+3 1 0 Don’t Care 


D. Even Quad-Word Access Sequence 


Don’t Care 
Byte 1 
Byte 3 


BYTE 7 BYTE 6 BYTE 5 BYTE 4 BYTE 3 BYTE 2 BYTE 1 BYTE 0 


Even Word Byte 1 

Even Word : 2 : : Byte 3 
Other bus cycles (instruction prefetch or slave) can occur here. 

3 Even Word A+4 0 0 Byte 5 

4 Even Word A+6 0 0 Byte 7 


E. Odd Quad-Word Access Sequence 


Byte O 
Byte 2 


Byte 4 
Byte 6 


| BYTE? BYTE 6 BYTE 5 BYTE 4 BYTE 3 BYTE 2 BYTE 1 BYTE 0 


1 Odd Byte A 0 1 Byte 0 

2 Even Word A+1 0 0 | Byte 2 

3 Even Byte A+3 4 0 Don’t Care 
Other bus cycles (instruction prefetch or slave) can occur here. 

4 Odd Byte A+4 0 1 Byte 4 

5 Even Word A+5 0 Byte 6 

6 Even Byte A+7 1 0 Don’t Care 
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Don’t Care 
Byte 1 
Byte 3 


Don’t Care 
Byte 5 
Byte 7 
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3.0 Functional Description (Continued) 
TABLE 3-4. interrupt Sequences 


Cycle Status Address DDIN HBE AO High Bus Low Bus 


A. Non-Maskable Interrupt Control Sequence 


Interrupt Acknowledge 
1 0100 FFFFO046 0 1 0 Don’t Care Don’t Care 


Interrupt Return 
None: Performed through Return from Trap (RETT) instruction. 
B. Non-Vectored Interrupt Control Sequence 


Interrupt Acknowledge 
1 0100 FFFEO01¢ 0 1 0 Don’t Care Don’t Care 


Interrupt Return 


None: Performed through Return from Trap (RETT) instruction. 


C. Vectored Interrupt Sequence: Non-Cascaded 


Interrupt Acknowledge 
1 | 0100 FFFE001¢ 0 1 0 Don’t Care Vector: 

| Range: 0-127 
Interrupt Return 
1 0110 FFFEOOi,¢ 0 1 0) Don’t Care Vector: Same as 

in Previous Int. 
D. Vectored Interrupt Sequence: Cascaded Ack. Cycle 

Interrupt Acknowledge 
1 0100 FFFEO01¢ 0 1 0 Don’t Care Cascade Index: 


range —16 to —1 


(The CPU here uses the Cascade Index to find the Cascade Address.) 


2 0101 Cascade 0 1 or 0 or Vector, range 0-255; on appropriate 
Address 0* 1? half of Data Bus for even/odd address 
Interrupt Return 
1 0110 FFFE004¢ 0 1 0 Don’t Care Cascade Index: 
, : same as in 
previous Int. 
Ack. Cycle 


(The CPU here uses the Cascade Index to find the Cascade Address.) | : 
2 0111 Cascade 0 1 or Oor Don’t Care Don’t Care 
Address | 0* Be 


* If the Cascaded ICU Address is Even (A0 is low), then the CPU applies HBE high and reads the vector number from bits 0-7 of the Data Bus. 


If the address is Odd (AO is high), then the CPU applies HBE low and reads the vector number from bits 8-15 of the Data Bus. The vector number 
may be in the range 0-255. 
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3.0 Functional Description (continued) 


3.4.7 Slave Processor Communication 


The SPC pin is used as the data strobe for Slave Processor 
transfers. In a Slave Processor bus cycle, data is transferred 
on the Data Bus (ADO-AD15), and the status lines STO- 
ST3 are monitored by the Slave Processor in order to deter- 
mine the type of transfer being performed. SPC is bidirec- 
tional, but is driven by the CPU during all Slave Processor 
bus cycles. See Section 3.8 for full protocol sequences. 


PREV. CYCLE 
T4 OR Ti 
CTTL I 4 


| 


“thd 


ST0=ST3 | oom ore 
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Zp | 


AD 
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aBE 
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*Note: CPU samples Data Bus here. 


NS32CG16 
CPU 


ST0-ST3 
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FIGURE 3-10. Slave Processor Connections 


NEXT CYCLE 
Ti OR T1 


FIGURE 3-11. Slave Processor Read Cycle 
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3.0 Functional Description (continue) 


3.4.7.1 Slave Processor Bus Cycles 


A Slave Processor bus cycle always takes exactly two clock 
cycles, labeled T1 and T4 (see Figures 3-17 and 3-712). 
During a Read cycle SPC is active from the beginning of T1 
to the beginning of T4, and the data is sampled at the end of 


ow WE Its WWE 


T1. The Cycle Status pins lead the cycle by one clock peri- 
od, and are sampled at the leading edge of SPC. During a 
Write cycle, the CPU applies data and activates SPC at T1, 
removing SPC at T4. The Slave Processor latches status on 
the leading edge of SPC and latches data on the trailing 
edge. 


The CPU does not pulse the Address Strobe (ADS), and no 
bus signals are generated. The direction of a transfer is de- 


PREV. CYCLE 
T4 OR Ti 


“LY 


CTTL 


ADO=AD15 


oe 


ST0=ST3 


ADS 


| LAK on 
af 


termined by the sequence (“protocol’’) established by the 
instruction under execution; but the CPU indicates the direc- 
tion on the DDIN pin for hardware debugging purposes. 


3.4.7.2 Slave Operand Transfer Sequences 


A Slave Processor operand is transferred in one or more 
Slave bus cycles. A Byte operand is transferred on the 
least-significant byte of the Data Bus (ADO-AD7), and a 
Word operand is transferred on the entire bus. A Double 
Word is transferred in a consecutive pair of bus cycles, 
least-significant word first. A Quad Word is transferred in 
two pairs of Slave cycles, with other bus cycles possibly 
occurring between them. The word order is from least-signif- 
icant word to most-significant. 


NEXT CYCLE 


11 T4 Ti OR T1 


DATA OUT 


be 
<2) fo 
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*Note: Slave Processor samples Data Bus here. 
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FIGURE 3-12. Slave Processor Write Cycle 


3.0 Functional Description (Continued) 


3.5 BUS ACCESS CONTROL 


The NS32CG16 CPU has the capability of relinquishing its 
access to the bus upon request from a DMA controller or 
another CPU. This capability is implemented on the HOLD 
(Hold Request) and HLDA (Hold Acknowledge) pins. By as- 
serting HOLD low, an external device requests access to 
the bus. On receipt of HLDA from the CPU, the device may 
perform bus cycles, as the CPU at this point has set ADO- 
AD15, A16—A23 and HBE to the TRI-STATE® condition and 
has switched ADS and DDIN to the input mode. The CPU 
now monitors ADS and DDIN from the external device to 
generate the relevant strobe signals (i.e., TSO, DBE, RD or 
WR). To return control of the bus to the CPU, the device 
sets HOLD inactive, and the CPU acknowledges return of 
the bus by setting HLDA inactive. 


How quickly the CPU releases the bus depends on whether 
it is idle on the bus at the time the HOLD request is made, 


CTTL I 
HOLD I 


ADS 


DDIN 


HBE 


ADO=AD15 


A16~A23 


STO=ST3 


TT Fo’ Fo’ Fo’ Fo. 4 


AFFECTED SIGNALS 


WAI Wa, 


as the CPU must always complete the current bus cycle. 
Figure 3-13 shows the timing sequence when the CPU is 
idle. In this case, the CPU grants the bus during the immedi- 
ately following clock cycle. Figure 3-14 shows the sequence 
if the CPU is using the bus at the time that the HOLD re- 
quest is made. If the request is made during or before the 
clock cycle shown (two clock cycles before T4), the CPU 
will release the bus during the clock cycle following T4. If 
the request occurs closer to T4, the CPU may already have 
decided to initiate another bus cycle. In that case it will not 
grant the bus until after the next T4 state. Note that this 
situation will also occur if the CPU is idle on the bus but has 
initiated a bus cycle internally. 

Note: During DMA cycles the WAIT1-2 signals should be kept inactive, 


unless they are also monitored by the DMA controller. If wait states 
are required, CWAIT should be used. 


Ti Ti Ti OR 14 


Ti OR T1 


NEXT ADDR 


{ NEXT STATUS 


TL/EE/9424-19 


FIGURE 3-13. HOLD Timing, Bus Initially Idle 
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3.0 Functional Description (continued) 
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FIGURE 3-14. HOLD Timing, Bus Initially Not Idle 


3.6 INSTRUCTION STATUS 


In addition to the four bits of Bus Cycle status (STO-ST3), 
the NS32CG16 CPU also presents Instruction Status infor- 
mation on three separate pins. These pins differ from STO- 
ST3 in that they are synchronous to the CPU’s internal in- 
struction execution section rather than to its bus interface 
section. 


PFS (Program Flow Status) is pulsed low as each instruction 
begins execution. It is intended for debugging purposes. 


U/S originates from the U bit of the Processor Status Regis- 
ter, and indicates whether the CPU is currently running in 
User or Supervisor mode. Although it is not synchronous to 
bus cycles, there are guarantees on its validity during any 
given bus cycle. See the Timing Specifications in Section 4. 
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3.0 Functional Description (continue) 


{LO (Interlocked Operation) is activated during an SBITI (Set 
Bit, Interlocked) or CBITI (Clear Bit, Interlocked) instruction. 
It is made available to external bus arbitration circuitry in 
order to allow these instructions to implement the sema- 
phore primitive operations for multi-processor communica- 
tion and resource sharing. ILO is guaranteed to be active 
during the operand accesses performed by the interlocked 
instructions. 
Note: The acknowledge of HOLD is on a cycle by cycle basis. Therefore, it 
is possible to have HLDA active when an interlocked operation is in 


progress. In this case, ILO remains low and the interlocked instruction 
continues only after HOLD is de-asserted. 


3.7 EXCEPTION PROCESSING 


Exceptions are special events that alter the sequence of 
instruction execution. The CPU recognizes two basic types 
of exceptions: interrupts and traps. 


An interrupt occurs in response to an event signalled by 
activating the NMI or INT input signals. Interrupts are typi- 
cally requested by peripheral devices that require the CPU’s 
attention. 


Traps occur as a result either of exceptional conditions 
(e.g., attempted division by zero) or of specific instructions 
whose purpose is to cause a trap to occur (e.g., supervisor 
call instruction). 


When an exception is recognized, the CPU saves the PC, 
PSR and the MOD register contents on the interrupt stack 
and then it transfers control to an exception service proce- 
dure. 


Details on the operations performed in the various cases by 
the CPU to enter and exit the exception service procedure 
are given in the following sections. 


MEMORY | 


re CASCADE ADDR 0 
CASCADE TABLE 
CASCADE ADDR 14 


It is to be noted that the reset operation is not treated here 
as an exception. Even though, like any exception, it alters 
the instruction execution sequence. 


The reason being that the CPU handles reset in a signifi- 
cantly different way than it does for exceptions. 


Refer to Section for details on the reset operation. 


3.7.1 Exception Acknowledge Sequence 


When an exception is recognized, the CPU goes through 
three major steps: 


1) Adjustment of Registers. 


Depending on the source of the exception, the CPU may 
restore and/or adjust the contents of the Program Coun- 
ter (PC), the Processor Status Register (PSR) and the 
currently-selected Stack Pointer (SP). A copy of the PSR 
is made, and the PSR is then set to reflect Supervisor 
Mode and selection of the Interrupt Stack. 


2) Vector Acquisition. 


A Vector is either obtained from the Data Bus or is sup- 
plied by default. 


3) Service Call. 


The Vector is used as an index into the Interrupt Dis- 
patch Table, whose base address is taken from the CPU 
Interrupt Base (INTBASE) Register. See Figure 3-715. A 
32-bit External Procedure Descriptor is read from the ta- 
ble entry, and an External Procedure Call is performed 
using it. The MOD Register (16 bits) and Program Coun- 
ter (32 bits) are pushed on the Interrupt Stack. 


NON-VECTORED INTERRUPT 


NON-MASKABLE INTERRUPT 


31 0 
RESERVED 


SLAVE PROCESSOR TRAP 
INTERRUPT BASE | CASCADE ADDR 15 ILLEGAL OPERATION TRAP 
REGISTER 
| FIXED INTERRUPTS 
| AND TRAPS SUPERVISOR CALL TRAP 
VECTORED DISPATCH TABLE 6 DIVIDE BY ZERO TRAP 
INTERRUPTS 

7 FLAG TRAP 
8 BREAKPOINT TRAP 
9 TRACE TRAP 


FIGURE 3-15. Interrupt Dispatch and Cascade Tables 
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3.0 Functional Description (continued) 


This process is illustrated in Figure 3-16, from the viewpoint Details on the sequences of events in processing interrupts 
of the programmer. and traps are given in the following sections. 


NS32CG16-10/NS32CG16-15 
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MODULE TABLE 


MODULE TABLE ENTRY 


INTERRUPT BASE 
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32 


PROGRAM COUNTER SB REGISTER 
ENTRY POINT ADDRESS NEW STATIC BASE 


FIGURE 3-16. Exception Acknowledge Sequence 
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3.0 Functional Description (continued) 


3.7.2 Returning from an Exception Service Procedure 


To return control to an interrupted program, one of two in- 
structions can be used: RETT (Return from Trap) and RETI 
(Return from Interrupt). 


RETT is used to return from any trap or a non-maskable 
interrupt service procedure. Since some traps are often 
used deliberately as a call mechanism for supervisor mode 
procedures, RETT can also adjust the Stack Pointer (SP) to 
discard a specified number of bytes from the original stack 
as surplus parameter space. 


RETI is used to return from a maskable interrupt service 
procedure. A difference of RETT, RETI also informs any 
external interrupt control units that interrupt service has 
completed. Since interrupts are generally asynchronous ex- 
ternal events, RETI does not discard parameters from the 
stack. 


Both of the above instructions always restore the PSR, 
MOD, PC and SB registers to their previous contents. 


PROGRAM COUNTER 


MODULE TABLE ENTRY 


STATIC BASE POINTER 
LINK BASE POINTER 
PROGRAM BASE POINTER 


(RESERVED) 


SB REGISTER 


STATIC BASE 


(POP) 
RETURN ADDRESS 


(POP) 


n 
BYTES 


3.7.3 Maskable Interrupts 


The INT pin isa level-sensitive input. A continuous low level 
is allowed for generating multiple interrupt requests. The in- 
put is maskable, and is therefore enabled to generate inter- 
rupt requests only while the Processor Status Register | bit 
is set. The | bit is automatically cleared during service of an 
INT or NMI request, and is restored to its original setting 
upon return from the interrupt service routine via the RETT 
or RETI instruction. 


The INT pin may be configured via the SETCFG instruction 
as either Non-Vectored (CFG Register bit |=0) or Vectored 
(bit |= 1). 

3.7.3.1 Non-Vectored Mode 


In the Non-Vectored mode, an interrupt request on the INT 
pin will cause an Interrupt Acknowledge bus cycle, but the 
CPU will ignore any value read from the bus and use instead 
a default vector of zero. This mode is useful for small sys- 
tems in which hardware interrupt prioritization is unneces- 
sary. 


32 BITS 
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INTERRUPT 
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MODULE TABLE ENTRY 
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FIGURE 3-17. Return from Trap (RETT n) Instruction Flow 


41 


TL/EE/9424-24 


Gl-9LDDZESN/0L-9LDOCESN 


NS32CG16-10/NS32CG16-15 


3.0 Functional Description (Continued) 


“END OF INTERRUPT” 
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FIGURE 3-18. Return from Interrupt (RETI) instruction Flow 


3.7.3.2 Vectored Mode: Non-Cascaded Case 


In the Vectored mode, the CPU uses an Interrupt Control 
Unit (ICU) to prioritize up to 16 interrupt requests. Upon re- 
ceipt of an interrupt request on the INT pin, the CPU per- 
forms an “Interrupt Acknowledge, Master’ bus cycle read- 
ing a vector value from the low-order byte of the Data Bus. 
This vector is then used as an index into the Dispatch Table 
in order to find the External Procedure Descriptor for the 
proper interrupt service procedure. The service procedure 
eventually returns via the Return from Interrupt (RETI) in- 
struction, which performs an End of Interrupt bus cycle, in- 
forming the ICU that it may re-prioritize any interrupt re- 
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quests still pending. The ICU provides the vector number 
again, which the CPU uses to determine whether it needs 
also to inform a Cascaded ICU. 


In a system with only one ICU (16 levels of interrupt), the 
vectors provided must be in the range of 0 through 127; that 
is, they must be positive numbers in eight bits. By providing 
a negative vector number, an ICU flags the interrupt source 
as being a Cascaded ICU (see below). 


3.7.3.3 Vectored Mode: Cascaded Case 


In order to allow up to 256 levels of interrupt, provision is 
made both in the CPU and in the NS32202 Interrupt Control 


3.0 Functional Description (Continued) 


Unit (ICU) to transparently support cascading. Figure 3-20 
shows a typical cascaded configuration. Note that the Inter- 
rupt output from a Cascaded ICU goes to an Interrupt Re- 
quest input of the Master ICU, which is the only ICU which 
drives the CPU INT pin. 


In a system which uses cascading, two tasks must be per- 
formed upon initialization: 


1) For each Cascaded ICU in the system, the Mater ICU 
must be informed of the line number (0 to 15) on which it 
receives the cascaded requests. 


2) A Cascade Table must be established in memory. The 
Cascade Table is located in a NEGATIVE direction from 
the location indicated by the CPU Interrupt Base (INT- 
BASE) Register. Its entries are 32-bit addresses, pointing 
to the Vector Registers of each of up to 16 Cascaded 
ICUs. 


Figure 3-75 illustrates the position of the Cascade Table. To 
find the Cascade Table entry for a Cascaded ICU, take its 
Master ICU line number (0 to 15) and subtract 16 from it, 
giving an index in the range —16 to —1. Multiply this value 
by 4, and add the resulting negative number to the contents 
of the INTBASE Register. The 32-bit entry at this address 
must be set to the address of the Hardware Vector Register 
of the Cascaded ICU. This is referred to as the “Cascade 
Address.” 


Upon receipt of an interrupt request from a Cascaded ICU, 
the Master ICU interrupts the CPU and provides the neg- 
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FIGURE 3-19. Interrupt Contro/ Unit Connections (16 Levels) 


NS32202 


ative Cascade Table index instead of a (positive) vector 
number. The CPU, seeing the negative value, uses it as an 
index into the Cascade Table and reads the Cascade Ad- 
dress from the referenced entry. Applying this address, the 
CPU performs an “Interrupt Acknowledge, Cascaded” bus 
cycle, reading the final vector value. This vector is interpret- 
ed by the CPU as an unsigned byte, and can therefore be in 
the range of 0 through 255. 


In returning from a Cascaded interrupt, the service proce- 
dure executes the Return from Interrupt (RETI) instruction, 
as it would for any Maskable Interrupt. The CPU performs 
an “End of Interrupt, Master’? bus cycle, whereupon the 

Master ICU again provides the negative Cascaded Table 

index. The CPU, seeing a negative value, uses it to find the 

corresponding Cascade Address from the Cascade Table. 

Applying this address, it performs an “End of Interrupt, Cas- 

caded” bus cycle, informing the Cascaded ICU of the com- 

pletion of the service routine. The byte read from the Cas- 
caded ICU is discarded. 

Note: If an interrupt must be masked off, the CPU can do so by setting the 
corresponding bit in the Interrupt Mask Register of the Interrupt Con- 
troller. However, if an interrupt is set pending during the CPU instruc- 
tion that masks off that interrupt, the CPU may still perform an inter- 
rupt acknowledge cycle following that instruction since it might have 
sampled the INT line before the ICU deasserted it. This could cause 
the ICU to provide an invalid vector. To avoid this problem the above 
operation should be performed with the CPU interrupt disabled. 
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3.0 Functional Description (continued) 
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FIGURE 3-20. Cascaded Interrupt Control Unit Connections 


3.7.4 Non-Maskable Interrupt 


The Non-Maskable Interrupt is triggered whenever a falling 
edge is detected on the NMI pin. The CPU performs an 
“Interrupt Acknowledge, Master’ bus cycle when process- 
ing of this interrupt actually begins. The Interrupt Acknowl- 
edge cycle differs from that provided for Maskable Inter- 
rupts in that the address presented is FFFF001¢. The vector 
value used for the Non-Maskable Interrupt is taken as 1, 
regardless of the value read from the bus. 


The service procedure returns from the Non-Maskable In- 
terrupt using the Return from Trap (RETT) instruction. No 
special bus cycles occur on return. 


For the full sequence of events in processing the Non- 
Maskable Interrupt, see Section 3.7.7.1. 


3.7.5 Traps 


Traps are processing exceptions that are generated as di- 
rect results of the execution of an instruction. The Return 
Address pushed by any trap except Trap (TRC) is the ad- 
dress of the first byte of the instruction during which the trap 
occurred. Traps do not disable interrupts, as they are not 
associated with external events. Traps recognized by 
NS32CG16 CPU are: | 


Trap (SLAVE): An exceptional condition was detected by 
the Floating Point Unit during the execution of a Slave In- 
struction. This trap is requested via the Status Word re- 
turned as part of the Slave Processor Protocol (Section 
3.8.1). 7 


3.0 Functional Description (Continued) 


Trap (ILL): Illegal operation. A privileged operation was at- 
tempted while the CPU was in User Mode (PSR bit U= 1). 


Trap (SVC): The Supervisor Call (SVC) instruction was exe- 
cuted. 


Trap (DVZ): An attempt was made to divide an integer by 
zero. (The SLAVE trap is used for Floating Point division by 
zero.) 


Trap (FLG): The FLAG instruction detected a ‘1’ in the 
CPU PSR F bit. 


Trap (BPT): The Breakpoint (BPT) instruction was execut- 
ed. 


Trap (TRC): The instruction just completed is being traced. 
See Section 3.7.6. 


Trap (UND): An undefined opcode was encountered by the 
CPU. 


3.7.6 Instruction Tracing 


Instruction tracing is a feature that can be used during de- 
bugging to single-step through selected portions of a pro- 
gram. Tracing is enabled by setting the T-bit in the PSR 
Register. When enabled, the CPU generates a Trace Trap 
(TRC) after the execution of each instruction. 


At the beginning of each instruction, the T bit is copied into 
the PSR P (Trace ‘“‘Pending’’) bit. If the P bit is set at the end 
of an instruction, then the Trace Trap is activated. If any 
other trap or interrupt request is made during a traced in- 
struction, its entire service procedure is allowed to complete 
before the Trace Trap occurs. Each interrupt and trap se- 
quence handles the P bit for proper tracing, guaranteeing 
only one Trace Trap per instruction, and guaranteeing that 
the Return Address pushed during a Trace Trap is always 
the address of the next instruction to be traced. 


Due to the fact that some instructions can clear the T and P 
bits in the PSR, in some cases a Trace Trap may not occur 
at the end of the instruction. This happens when one of the 
privileged instructions BICPSRW or LPRW PSR is executed. 


In other cases, it is still possible to guarantee that a Trace 
Trap occurs at the end of the instruction, provided that spe- 
cial care is taken before returning from the Trace Trap Serv- 
ice Procedure. In case a BICPSRB instruction has been ex- 
ecuted, the service procedure should make sure that the T 
bit in the PSR copy saved on the Interrupt Stack is set be- 
fore executing the RETT instruction to return to the program 
begin traced. If the RETT or RETI instructions have to be 
traced, the Trace Trap Service Procedure should set the P 
and T bits in the PSR copy on the Interrupt Stack that is 
going to be restored in the execution of such instructions. 


While debugging the NS32CG16 instructions which have in- 
terior loops (BBOR, BBXOR, BBAND, BBFOR, EXTBLT, 
MOVMP, SBITPS, TBITS), special care must be taken with 
the single-step trap. If an interrupt occurs during a single- 
step of one of the graphics instructions, the interrupt will be 
serviced. Upon return from the interrupt service routine, the 
new NS32CG16 instruction will not be re-entered, due to a 
single-step trap. Both the NMI and INT interrupts will cause 
this behavior. Another single-step operation (S command in 
DBG16/MONCG) will resume from where the instruction 
was interrupted. There are no side effects from this early 
termination, and the instruction will complete normally. 


For all other Series 32000 instructions, a single-step opera- 
tion will complete the entire instruction before trapping back 
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to the debugger. On the instructions mentioned above, sev- 
eral single-step commands may be required to complete the 
instruction, ONLY when interrupts are occurring. 


There are some methods to give the appearance of single- 
stepping for these NS32CG16 instructions. 


1. MON16/MONCG monitors the return from single-step 
trap vector, PC value. If the PC has not changed since the 
last single-step command was issued, the single-step oper- 
ation is repeated. It is also advisable to ensure that one of 
the NS32CG16 instructions is being single-stepped, by in- 
specting the first byte of the address pointed to by the PC 
register. If it is OxOE, then the instruction is an NS32CG16- 
specific instruction. 
2. A breakpoint following the instruction would also trap af- 
ter the instruction had completed. 
Note: If instruction tracing is enabled while the WAIT instruction is executed, 
the Trap (TRC) occurs after the next interrupt, when the interrupt 
service procedure has returned. 


3.7.7 Priority Among Exceptions 


The NS32CG16 CPU internally prioritizes simultaneous in- 
terrupt and trap requests as follows: 


1) Traps other than Trace (Highest priority) 
2) Non-Maskable Interrupt 

3) Maskable Interrupts 

4) Trace Trap (Lowest priority) 


3.7.8 Exception Acknowledge Sequences: Detail Flow 


For purposes of the following detailed discussion of inter- 
rupt and trap acknowledge sequences, a single sequence 
called ‘‘Service”’ is defined in Figure 3-27. Upon detecting 
any interrupt request or trap condition, the CPU first per- 
forms a sequence dependent upon the type of interrupt or 
trap. This sequence will include pushing the Processor 
Status Register and establishing a Vector and a Return Ad- 
dress. The CPU then performs the Service sequence. 


3.7.8.1 Maskable/Non-Maskable Interrupt Sequence 
This sequence is performed by the CPU when the NMI pin 
receives a falling edge, or the INT pin becomes active with 
the PSR | bit set. The interrupt sequence begins either at 
the next instruction boundary or, in the case of the String 
instructions, or Graphics instructions which have interior 
loops (BBOR, BBXOR, BBAND, BBFOR, EXTBLT, MOVMP, 
SBITPS, TBITS), at the next interruptible point during its ex- 
ecution. The graphics instructions are interruptible. 

1. If a String instruction was interrupted and not yet com- 

pleted: 


a. Clear the Processor Status Register P bit. 


b. Set ‘Return Address” to the address of the first byte 
of the interrupted instruction. 


Otherwise, set “Return Address” to the address of the 
next instruction. 


2. Copy the Processor Status Register (PSR) into a tempo- 
rary register, then clear PSR bits S, U, T, P and I. 


3. If the interrupt is Non-Maskable: 


a. Read a byte from address FFFF0046, applying Status 
Code 0100 (interrupt Acknowledge, Master: Section 
3.4.1). Discard the byte read. 


b. Set ‘Vector’ to 1. 
c. Go to Step 8. 
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3.0 Functional Description (Continue) 


4. lf the interrupt is Non-Vectored: 


a. Read a byte from address FFFF00,6, applying Status 
Code 0100 (Interrupt Acknowledge, Master: Section 
3.4.1). Discard the byte read. 


b. Set “Vector” to 0. 


c. Go to Step 8. 

5. Here the interrupt is Vectored. Read “Byte” from ad- 
dress FFFE00¢6, applying Status Code 0100 (Interrupt 
Acknowledge, Master: Section 3.4.1). 

6. If “Byte” = 0, then set “Vector” to “Byte” and go to 
Step 8. 

7. If “Byte” is in the range — 16 through —1, then the inter- 
rupt source is Cascaded. (More negative values are re- 
served for future use.) Perform the following: 

a. Read the 32-bit Cascade Address from memory. The 
address is calculated as INTBASE + 4* Byte. 

b. Read ‘Vector’, applying the Cascade Address just 
read and Status Code 0101 (Interrupt Acknowledge, 
Cascaded: Section 3.4.1). 


8. Push the PSR copy (from Step 2) onto the Interrupt 
Stack as a 16-bit value. 


9. Perform Service (Vector, Return Address), Figure 3-27. 
Service (Vector, Return Address): 
1) Read the 32-bit External Procedure Descriptor from the 


Interrupt Dispatch Table: address is 
Vector*4+ INTBASE Register contents. 


2) Move the Module field of the Descriptor into the MOD 


Register. 
3) Read the new Static Base pointer from the memory ad- 
dress contained in MOD, placing it into the SB Register. 


4) Read the Program Base pointer from memory address 
MOD + 8, and add to it the Offset field from the Descrip- 
tor, placing the result in the Program Counter. 

5) Flush Queue: Non-sequentially fetch first instruction of 
Interrupt Routine. — 

6) Push MOD Register onto the Interrupt Stack as a 16-bit 
value. (The PSR has already been pushed as a 16-bit 
value.) 

7) Push the Return Address onto the Interrupt Stack as a 
32-bit quantity. 


FIGURE 3-21. Service Sequence 
Invoked during All Interrupt/Trap Sequences 
3.7.8.2 Trap Sequence: Traps Other Than Trace 


1) Restore the currently selected Stack Pointer and the 
Processor Status Register to their original values at the 
start of the trapped instruction. 


2) Set “Vector” to the value corresponding to the trap type. 


SLAVE: Vector=3. 
ILL: Vector= 4. 
SVC: Vector=5. 
DVZ: Vector=6. 
FLG: Vector = 7. 
BPT: Vector= 8. 
UND: Vector = 19. 
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3) Copy the Processor Status Register (PSR) into a tempo- 
rary register, then clear PSR bits S, U, P and T. 


4) Push the PSR copy onto the Interrupt Stack as a 16-bit 
value. 


5) Set “Return Address” to the address of the first byte of 
the trapped instruction. 


6) Perform Service (Vector, Return Address), Figure 3-27. 


3.7.8.3 Trace Trap Sequence 

1) In the Processor Status Register (PSR), clear the P bit. 

2) Copy the PSR into a temporary register, then clear PSR 
bits S, U and T. 

3) Push the PSR copy onto the Interrupt Stack as a 16-bit 
value. 

4) Set “Vector” to 9. 


5) Set “Return Address’ to the address of the next instruc- 
tion. 


- 6) Perform Service (Vector, Return Address), Figure 3-21. 


3.8 SLAVE PROCESSOR INSTRUCTIONS 


The NS32CG16 supports only one group of instructions, the 
floating point instruction set, as being executable by a slave 
processor. The floating point instruction set is validated by 
the F bit in the CFG register. 


If a floating-point instruction is encountered and the F bit in 
the CFG register is not set, a Trap(UND) will result, without 
any slave processor communication attempted by the CPU. 
This allows software emulation in case an external floating 
point unit (FPU) is not used. 


3.8.1 Slave Processor Protocol 


Slave Processor instructions have a three-byte Basic In- 
struction field, consisting of an ID Byte followed by an Oper- 
ation Word. The ID Byte has three functions: 


1) It identifies the instruction as being a Slave Processor 
instruction. 


2) It specifies which Slave Processor will execute it. 


3) It determines the format of the following Operation Word 
of the instruction. 


Upon receiving a Slave Processor instruction, the CPU initi- 
ates the sequence outlined in Figure 3-22. While applying 
Status Code 1111 (Broadcast ID, Section 3.4.1), the CPU 
transfers the ID Byte on the least-significant half of the Data 
Bus (ADO—AD7). All Slave Processors input this byte and 
decode it. The Slave Processor selected by the ID Byte is 
activated, and from this point the CPU is communicating 
only with it. If any other slave protocol was in progress (e.g., 
an aborted Slave instruction), this transfer cancels it. 


The CPU next sends the Operation Word while applying 
Status Code 1101 (Transfer Slave Operand, Section 3.4.1). 
Upon receiving it, the Slave Processor decodes it, and at 
this point both the CPU and the Slave Processor are aware 
of the number of operands to be transferred and their sizes. 
The Operation Word is swapped on the Data Bus; that is, 
bits 0-7 appear on pins AD8—AD15 and bits 8-15 appear 
on pins ADO-AD7. 


3.0 Functional Description (Continued) 


Using the Addressing Mode fields within the Operation 
Word, the CPU starts fetching operands and issuing them to 
the Slave Processor. To do so, it references any Addressing 
Mode extensions which may be appended to the Slave 
Processor instruction. Since the CPU is solely responsible 
for memory accesses, these extensions are not sent to the 
Slave Processor. The Status Code applied is 1101 (Transfer 
Slave Processor Operand, Section 3.4.1). 


Status Combinations: 

Send ID (ID): Code 1111 

Xfer Operand (OP): Code 1101 
Read Status (ST): Code 1110 


Step Status Action 


1 ID CPU Sends ID Byte. 

2 OP CPU Sends Operation Word. 

3 OP CPU Sends Required Operands. 
4 — Slave Starts Execution. CPU Pre- 


Fetches. 


5 — Slave Pulses SPC Low. 

6 ST CPU Reads Status Word. (Trap? 
Alter Flags?) 

7 OP CPU Reads Results (If Any). 


FIGURE 3-22. Slave Processor Protocol 
After the CPU has issued the last operand, the Slave Proc- 
essor starts the actual execution of the instruction. Upon 
completion, it will signal the CPU by pulsing SPC low. 


While the Slave Processor is executing the instruction, the 
CPU is free to prefetch instructions into its queue. If it fills 
the queue before the Slave Processor finishes, the CPU will 
wait, applying Status Code 0011 (Waiting for Slave). 


Upon receiving the pulse on SPC, the CPU uses SPC to 
read a Status Word from the Slave Processor, applying 
Status Code 1110 (Read Slave Status). This word has the 
format shown in Figure 3-23. \f the Q bit (“Quit’’, Bit 0) is set, 
this indicates that an error was detected by the Slave Proc- 
essor. The CPU will not continue the protocol, but will imme- 
diately trap through the Slave vector in the Interrupt Table. 
Certain Slave Processor instructions cause CPU PSR bits to 
be loaded from the Status Word. 


The last step in the protocol is for the CPU to read a result, 
if any, and transfer it to the destination. The Read cycles 
from the Slave Processor are performed by the CPU while 
applying Status Code 1101 (Transfer Slave Operand). 


3.8.2 Floating Point Instructions 


Table 3-5 gives the protocols followed for each Floating 
Point instruction. The instructions are referenced by their 
mnemonics. For the bit encodings of each instruction, see 
Appendix A. 


TABLE 3-5. Floating Point Instruction Protocols 


GL-9LDOZESN/OL-9LDOZESN 


Operand 1 Operand 2 Operand 1 
Mnemonic Class Class Issued 
ADDf read.f rmw.f f 
SUBf read.f rmw.f f 
MULf read.f rmw.f f 
DIVf read.f rmw.f f 
MOVf read.f write.f f 
ABSf read.f write.f f 
NEGf read.f write. f f 
CMPf read.f read.f f 
FLOORTi read.f write.i f 
TRUNCfi read. f write.i f 
ROUNDrfi read.f write.i f 
MOVFL read.F write.L F 
MOVLF read.L write.F L 
MOVif read.i write.f i 
LFSR read.D N/A D 
SFSR N/A write.D N/A 
POLYf _read.f read.f f 
DOTT read.f read.f f 
SCALBf read.f rmw.f f 
LOGBf read.f write.f f 
Note: 


D = Double Word 
i = integer size (B,W,D) specified in mnemonic. 


f = Floating Point type (F,L) specified in mnemonic. 
N/A = Not Applicable to this instruction. 
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Operand 2 Returned Value PSR Bits 
issued Type and Dest. Affected 
f f to Op. 2 none 
f f to Op. 2 none 
f f to Op. 2 none 
f f to Op. 2 none 
N/A f to Op. 2 none 
N/A f to Op. 2 none 
N/A f to Op. 2 none 
f N/A N,Z,L 
N/A ito Op. 2 none 
N/A ito Op. 2 none 
N/A ito Op. 2 none 
N/A L to Op. 2 none 
N/A F to Op. 2 none 
N/A f to Op. 2 none 
N/A N/A none 
N/A D to Op. 2 none 
f f to FO none 
f f to FO none 
f f to Op. 2 none 
N/A f to Op. 2 none 


NS32CG16-10/NS32CG 16-15 


3.0 Functional Description (Continued) 

The Operand class columns give the Access Class for each 
general operand, defining how the addressing modes are 
interpreted (see Series 32000 Instruction Set Reference 


. Manual). | 
_ The Operand Issued columns show the sizes of the oner- 


ands issued to the Floating Point Unit by the CPU. ‘‘D” indi- 
cates a 32-bit Double Word. “‘i’” indicates that the instruction 
specifies an integer size for the operand (B=Byte, 
W = Word, D= Double Word). “f” indicates that the instruc- 
tion specifies a Floating Point size for the operand (F = 32- 
bit Standard Floating, L= 64-bit Long Floating). 
The Returned Value Type and Destination column gives the 
size of any returned value and where the CPU places it. The 
PSR Bits Affected column indicates which PSR bits, if any, 
are updated from the Slave Processor Status Word (Figure 
3-23). 

15 87 0 


EEf_ 


New PSR Bit Value(s) 
Quit": Terminate Protocol, Trap(FPU). 


TL/EE/9424—28 
FIGURE 3-23. Slave Processor Status Word Format 


Any operand indicated as being of type “f’” will not cause a 
transfer if the Register addressing mode is specified. This is 
because the Floating Point Registers are physically on the 
Floating Point Unit and are therefore available without CPU 
assistance. 


4.0 Device Specifications 


4.1 NS32CG16 PIN DESCRIPTIONS 


The following is a brief description of all NS32CG16 pins. 
The descriptions reference portions of the Functional De- 
scription, Section 3. 

Unless otherwise indicated, reserved pins should be left 
open. 


Note: An asterisk next to the signal name indicates a TRI-STATE condition 
for that signal during HOLD acknowledge. 


4.1.1 Supplies 
Vec. Logic Power. 
+ 5V positive supply for on-chip logic. | 
VCCCTTL, Buffers Power. 
VCCFCLK, +5V positive supplies for on-chip output 


VCCAD, _ buffers. 
vccio 
VSSL Logic Ground. 


Ground reference for on-chip logic. 
VSSFCLK, Buffers Ground. 
VSSNTSC, Ground reference for on-chip output buffers. 
VSSHAD, 
VSSLAD, 
VSsio 
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4.1.2 input Signals | 


RSTI Reset Input. 
Schmitt triggered, asynchronous signal used to 
generate a CPU reset. See Section 3.3. 


Note: 


The reset signal is a true asynchronous input. Therefore, no 
external synchronizing circuit is needed. 

When RSTI changes right before the falling edge of CTTL, 
and meets the specified set-up time, it will be recognized on 
that falling edge. Otherwise it will be recognized on the fall- 
ing edge of CTTL in the following clock cycle. 


HOLD Hold Request. 
When active, causes the CPU to release the 
bus for DMA or multiprocessing purposes. See 
Section 3.5. 
Note: 
If the HOLD signal is generated asynchronously, its set up 
and hold times may be violated. In this case, it is recom- 
mended to synchronize it with CTTL to minimize the possibili- 
ty of metastable states. 
The CPU provides only one synchronization stage to mini- 
mize the HLDA latency. This is to avoid speed degradations 
in cases of heavy HOLD activity (i.e., DMA controller cycles 
interleaved with CPU cycles). 

INT interrupt. 
A low level on this pin requests a maskable in- 
terrupt. INT must be kept asserted until the in- 
terrupt is acknowledged. 
Note: 
if INT is from a asynchronous source, it should be synchro- 
nized with CTTL to minimize the possibility of metastable 
states. 

NMI Non-Maskable Interrupt. 
A High-to-Low transition on this signal requests 
a non-maskable interrupt 

CWAIT Continuous Wait. 
Causes the CPU to insert continuous wait 
states if sampled low at the end of T2 and each 
following T-State. See Section 3.4.3. 

WAIT1-2 Two-Bit Wait State Inputs. 
These inputs, collectively called WAIT1-2, al- 
low from zero to three wait states to be speci- 
fied. They are binary weighted. See Section 
3.4.3. 
Note: 
During a DMAC cycle, WAIT1-2 should be kept inactive to 
prevent loss of synchronization. Wait states, in this case, 
should be generated through CWAIT. 

OSCIN Crystal/External Clock Input. 
Input from a crystal or an external clock source. 
See Section 3.2. 

4.1.3 Output Signals 

A16-A23 “*High-Order Address Bits. 


These are the most significant 8 bits of the 
memory address bus. 

HBE *High Byte Enable. 
Status signal used to enable data transfers on 
the most significant byte of the data bus. 


4.0 Device Specifications (Continued) 

STO-3 Status. 

Bus cycle status code; STO is the least significant. 
Encodings are: 

0000—lIdle: CPU Inactive on Bus. 
0001—Idle: WAIT Instruction. 
0010—(Reserved) 

0011—lIdle: Waiting for Slave. 
0100—Interrupt Acknowledge, Master. 
0101—Interrupt Acknowledge, Cascaded. 
0110—End of Interrupt, Master. 
0111—End of Interrupt, Cascaded. 
1000—Sequential Instruction Fetch. 
1001—Non-Sequential Instruction Fetch. 
1010—Data Transfer. 

1011—Read Read-Modify-Write Operand. 
1100—Read for Effective Address. 
1101—Transfer Slave Operand. 
1110—Read Slave Status Word. 
1111—Broadcast Slave ID. 

U/S User/Supervisor. 

User or Supervisor Mode status. High indicates 
User Mode; low indicates Supervisor Mode. 

ILO interlocked Operation. 

When active, indicates that an interlocked oper- 
ation is being executed. 

HLDA Hold Acknowledge. — 
Activated by the CPU in response to the HOLD 
input to indicate that the CPU has released the 
bus. 

PFS Program Flow Status. 

A pulse on this signal indicates the beginning of 
execution of an instruction. 

BPU BPU Cycle. 

This signal is activated during a bus cycle to 
enable an external BITBLT processing unit. The 
EXTBLT instruction activates this signal.* 

RSTO Reset Output. 

This signal becomes active when RSTI is low, 
initiating a system reset. 

RD Read Strobe. 

Activated during CPU or DMAC read cycles to 
enable reading of data from memory or periph- 
erals. See Section 3.4.2. 

WR Write Strobe. 

Activated during CPU or DMAC write cycles to 
enable writing of data to memory or peripherals. 


*Note: BPU is low (Active) only during bus cycles involving pre-fetching in- 
structions and execution of EXTBLT operands. It is recommended 
that BPU, ADS and status lines (STO-ST3) be used to qualify BPU 
bus cycles. If a DMA circuit exists in the system, the HLDA signal 
should be used to further qualify BPU cycles. BPU may become 
active during T4 of a non-BPU bus cycle, and may become inactive 
during T4 of a BPU bus cycle. BPU must be qualified by ADS and 
status lines (STO-ST3) to be used as an external gating signal. 


TSO Timing State Output. 
The falling edge of TSO identifies the beginning 
of state T2 of a bus cycle. The rising edge iden- 
tifies the beginning of state T4. 


DBE Data Buffers Enable. 
Used to control external data buffers. It is active 
when the data buffers are to be enabled. 


OSCOUT Crystal Output. 
This line is used as the return path for the crys- 
tal (if used). It must be left open when an exter- 
nal clock source is used to drive OSCIN. 


FCLK Fast Clock. 
This clock is derived from the clock waveform 
on OSCIN. Its frequency is either the same as 
OSCIN or is lower, depending upon the scale 
factor programmed into the CFG register. See 
Section 3.2.1. 


PHI1, PHI2 Two-Phase Clock. 
These outputs provide a two-phase clock with 
frequency half that of FCLK. They can be used 
to clock the DP8510/DP8511 BPU. The trace 
lengths of PHI1 and PHI2 should be shorter 
than 4 inches (10 centimeters) when connected 
to the BPU. 


CTTL System Clock. 
This clock is similar to PHI1 but has a much 
higher driving capability. The skew between its 
rising edge and PHI1 rising edge is kept to a 
minimum. | 


4.1.4 Input-Output Signals 


ADO-15 *Address/Data Bus. 
Multiplexed Address/Data information. Bit 0 is 
the least significant bit of each. 


SPC Slave Processor Control. 
Used by the CPU as the data strobe output for 
slave processor transfers; used by a slave proc- 
essor to acknowledge completion of a slave in- 
struction. See Section 3.4.7.1. 


DDIN *Data Direction. 
Status signal indicating the direction of the data 
transfer during a bus cycle. During HOLD ac- 
knowledge this signal becomes an input and 
determines the activation of RD or WR. 


*Address Strobe 

Controls address latches; signals the beginning 
of a bus cycle. During HOLD acknowledge this 
signal becomes an input and the CPU monitors 
it to detect the beginning of a DMAC cycle and 
generate the relevant strobe signals. When a 
DMAC is used, ADS should be pulled up to Vcc 
through a 10 kf resistor. 


> 
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4.0 Device Specifications (Continued) 
4.2 ABSOLUTE MAXIMUM RATINGS 


lf Military/Aerospace specified devices are required, 
please contact the National Semiconductor Sales. 
Office/Distributors for availability and specifications. 


Ba Oat, nim ee, ose som nn ae mn an ee 


Température Under Bias 0°C to + 70°C 
Storage Temperature —65°C to + 150°C 


All Input or Output Voltages with 
Respect to GND —0.5V to +7V 


Note: Absolute maximum ratings indicate limits beyond 
which permanent damage may occur. Continuous operation 
at these iimits is not intended; operation should be limited to 
those conditions specified under Electrical Characteristics. 


4.3 ELECTRICAL CHARACTERISTICS: T, = 0°C to + 70°C, Voc = 5V +5%, GND = OV 


Symbol 

Vin__| HighLevel inputVottage | (Noted) | 

Vit 

Vr+ | RSTIRising ThresholdVottage | Voo=50V | 

Vays | RSTiHysteresis Voltage | Voo=5.0V | 
ae 
| 0.90 Voo_| 


VxXL OSCIN Input Low Voltage | 


ae 
Vx | OSciNinputHigh Votage | | Vo 0.9 
VOH 0.90 Vcc 
Vor__| LowLevelOutputVottage | ton =4mA | 
ns | SPCinputCurrent (low) | Vin = 0.4V, SPCin inputMode | 0.05 | 
____| ImputLoadGurrent_ | 0 < Vin < Voo, Allnputsexcept SPC | -20 


| Leakage Current 
Output and I/O Pins in 
TRI-STATE Input Mode © 


0.4 < Vout < Vcc 


loc Active Supply Current lout = 0, Ta = 25°C (Note 2) ae 140 


T [Woot 08 |v 
=o | | 08 |v 
[as 
[18 v 
[sv 
i ee 
es a 
[ [exe ves |v 
[10 [mc 
ee 
tf 
rio | 00 | mA 


Note 1: Care should be taken by designers to provide a minimum inductance path between the Vss pins and system ground in order to minimize noise. 
Note 2: Icc is affected by the clock scaling factor selected by the C and M bits in the CFG register, see Section 3.2.1. 

Note 3: Vi_ min—in the range of —0.5V to —1.5V, the pulse must be < 20 ns, and the period between pulses = 120 ns. 

Note 4: Viti mac—in the range of Vcc + 0.5V to Voc + 2.0V, the pulse must be < 25 ns, and the period between pulses = 120 ns. 


68-Pin PCC Package 


a 
Fa 
a 
ad 
(34 


TO 


& GIS 
1112 13 14 15 16 17 


$12 
ST3 
PFS 
DDIN 
ADS 
SPC 
vccio 
HBE 
HOLDA 
HOLD 
RSTO 
WAIT! 
WAIT2 
CWAIT 
~ -YSSL 
OSCIN 
RST 
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1 
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VSSHAD 
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Ee 10 jx 1a Oo wwe MAM SS = Nu 9 
seRRREERBSSEgER 2a 8 
~” a Oo nA O 
. g fF #8 
TL/EE/9424-29 
Bottom View 


FIGURE 4-1. Connection Diagram 
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4.0 Device Specifications (Continued) 


4.4 SWITCHING CHARACTERISTICS ABBREVIATIONS: 
4.4.1 Definitions L.E.—leadingedge R.E.—rising edge 
All the timing specifications given in this section refer to T.E. — trailing edge F.E. — falling edge 
15% or 85% of Vcc on the rising or falling edges of CTTL, 
and all output signals; and to 0.8V or 2.0V on all the TTL CTIL | 
input signals as illustrated in Figures 4-2 and 4-3 unless 
specifically stated otherwise. 
O85 V sct| 
CTTL | 0.15 vee 
SIG1I TL/EE/9424-31 
tsigzh FIGURE 4-3. Timing Specification Standard 
(TTL Input Signals) 
sic2 | 0.85 Voc 


TL/EE/9424-30 
FIGURE 4-2. Timing Specification Standard 
(CMOS Output Signals) 


4.4.2 DEVICE TESTING 


TEST EQUIPMENT 


PRECISION DIGITAL 
VOLTMETER 


a om a= ow om =» o ow A 


PROGRAMMABLE 
CURRENT 
SOURCE/SINK 


SIGNAL 
UNDER TEST 


. CAPACITIVE 


LT LOADING 


FIGURE 4.4. Test Loading Configuration 


TL/EE/9424-65 — 


TABLE 4-1. Test Loading Characteristics 


High Level Low Level input Load 
Output Voltage |Output Voitage Current 
(lon = —400 pA) (lol = 4mA) | (0 < Vin < Voc) 


HBE, STO-3, U/S 50 pF > 0.90 Voc < 0.10 Voc 
ILO, HLDA 7X, PFS, 

BPU, RSTO, RD, 

WR, TSO, DBE, 

OSCOUT, FCLK, DDIN, 

ADS 


RSTI, HOLD = lhe 50 pF —20 pA < | < 20 PAI2:0V < Vin < Voc + 0.5V)/—0.5V < Vi_ < 0.8V 
Na CWATT T, WAI 


OSCIN | 50pF Vi < 0.3 
ADO-15, A16-28, Nae nt re A Se 

ADO-15 pto0pF | | 20 AS 1h < 20 WAIZOV < Vin < Vor + 0.5V|-O.5V < Vu < 0.8V 
PHIT, PHI2 | 30pF | Von20.90VccVors0toVed | 

SPC Min = 0.4V 


Capacitive High Level Low Level 


Signal Name input Voltage input Voltage 


Loading 
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NS32CG16-10/NS32CG16-15 


4.0 Device Specifications (Continued 
4.4.3 Timing Tables 


4.4.3.1 Output Signals: Internal Propagation Delays, NS32CG16-10 and NS32CG16-15 


Figure Description Reference/Conditions 


tote 4-20 | CTTL Clock Period R.E., CTTL to Next R.E., CTTL se 1000 a ae 1000 


ea 4-20 | CTTLHigh Time 100 pF Capacitive Load 0.5 totp 0.5tcoTp | 0.5 tot 
— 10ns fi: 7ns 


At 1.5V (Both Edges) 
CTTL Low Time At 0.8V 0.5 tcTp | 0-5 tcTp | 0.5 tcTp | 0.5 tcTp 
—-8ns | +6ns | —6ns |] +2ns 


(See Note 1) 
tcTr 4-20 | CTTL Rise Time 15% to 85% Voc on R.E., CTTL Sees po ns 


75 pF Capacitive Load 
50 pF Capacitive Load 


25 pF Capacitive Load 


tcTl 


toTy 4-20 | CTTL Fall Time 85% to 15% Voc on F.E., CTTL ns 


tcLw(1,2) 4-20 | PHI1, PHI2 Pulse Width At 2.0V on PHI1, PHI2 0.5 on 0.5 tcTp = 5 8 0.5 tcTp 
(Both Edges) —10ns; +5ns | —6ns | + 2ns 


toLh Clock High Time At 90% Voc on PHI1, PHI2 0.5 tcTp 0.5 tcTp 
ia aia (Both Edges) = 15 ns | °5 TP | ~ tons | 8 tore 
tnOVL(1,2) PHI1, PHI2, Non-Overlap | At 50% Vcc on PHI, PHI2 _ 
Time 
Xr OSCIN to FCLK 80% Voc on R.E., OSCIN a 
R.E. Delay to R.E., FOLK 
tFor FCLK to CTTL R.E., FCLK to R.E., CTTL 
ns 
R.E. Delay 
tFor FCLK to CTTL R.E., FCLK to F.E., CTTL 
. 4 ns 
F.E. Delay 
= FCLK to PHI R.E., FCLK to R.E., PHIt 
r ns 
R.E. Delay 
tFPr FCLK to PHI R.E., FCLK to F.E., PHI1 
ns 
F.E. Delay 
tpor 4-20 | CTTLandPHH Skew _| R.E., CTTLto R.E., PHI1 ope ns 
tav | 4-5 | Address Bits 0-15 Valid | after R.E., CTTLT1 | | go | to | 30 | sons 
tah | 4-5. | Address Bits 0-15 Hold | after R.E., CTTLT2. Fo o5 | | 5 | | ns 
taty | 4-5 ‘| Address Bits 16-23 Valid | after R.E., CTTLT1 ee ee ns 
tah __| 4-5 _| Address Bits 16-23 Hold | after R.E.,CTTLNext T1 or Ti | ns 
taLnfr 4-5 | ADO-AD15 Active Non-Float after R.E., CTTL T1 


(See Note 2) 


Note 1: Device testing is performed using the Test Loading Characteristics in Table 4.1. Additional timing data for CTTL with various capacitive loads is not 100% 
tested. 


Note 2: tai ntr is address bits 0-15 not floating or active after R.E. CTTL T1. This is only valid if the previous CPU cycle was a read (Figure 4.5). A previous write 
may have ‘‘data” active into T1 of the next cycle which then becomes “address” during T1. 
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4.0 Device Specifications (Continued) 
4.4.3.1 Output Signals: Internal Propagation Delays, NS32CG16-10 and NS32CG 16-15 (Continued) 


re 4-7 | ADO-AD15 Floating cacao Gane - a iP - 
(Caused by HOLD) 
wt | 47 | At6-A23Floating | afterRE.cTT | | 25 | | 8s 
tov __| 46,410 | DataValid(Write Cycle) | afterRE,CTrLT20rm | | 60 | | 38 | ns 
ton | 46,410 | DataHold fatter RE, CTTLNextTHorTi| o | | o | | ns 
twos | 45 ns 
tapsw ADS Pulse Width at15% Voc (BothEdges) | 30 | | 2 | | ns 
twost_| 47 | ADSFloatng Sf atterRe.cTm | | 85 | | 40 ns 
tos | 48 | ADSRetumtromFloating | afterRE.CTTLT | | 88 | | 40s 
taanss | 46 | Address Bits0-15Setup | beforeADSTE. | 25 | | 20 | ns 
tatapsn | 45 | Address Bits0-15Hold | afterADST:€. | ts | | 12 | ns 
tuoev | 45 | HBESignalvalid | afterRe,cTTLT! | | 70 | | 88s 
tyoen | 45 | HBESignalHold | afterRE,CTTLNextTtorTi| o | | o | | ns 
tweet | 47 | HBESignaiFloating | afterRe.cTm.T | | 5 | | 40s 
twoer | 48 | HBERetumnfromFloating | afterRe,.cTLT ss |_| 85 | | 0s 
toony | 45 | DDINSignalvalid | afterRe.cTmuT1 | ||| 88 | ns 
topnn_| 45 | DDINSignalHold | afterRE,CTTLNextTtorTi] o | | o | | ns 
ton DDIN Floating afterRE,cTmT | | 85 | | 40s 
tone DDIN Return fromFloating | afterRE,CTTLT |_| 85 | | 40 | ns 
spa SPCOuputactve | afterRE.CTT: | | 85 | 5 | 6s 
tsPoia SPCOutputinactive | afterRE.CTTLTA |_| 85 || 6 | ns 
tsPOnt SPCOutputNonForcing | afterTe,cTTLT4 | | 10 | || 
tHLDAa HUDASignalactive | afterRE.CTLT! | | 50 | || 
tpaia | 48 | HLDASignalinactive | afterRE.cTrLT! | | 50 | | 
= [a frre lemme, | =| 1 *T- 
(before T1, see Note 1) 
tsm | 45 | StatusSTO-sT3Hold | afterRe.cTutT | o =| | oo | os 
tapw_| 45 | BPUSignalvaid | afterRe.crmuT4 | | 5 | | 80s 
tapun_| 45 | BPUSignalHold —fafterRe.cTmLT4 | 10 | | | ns 
Note 1: Every memory cycle starts with T4, during which Cycle Status is applied. If the CPU was idling, the sequence will be: “... Ti, T4, T1...”. If the CPU was 


not idling, the sequence will be: “...T4,T1...”. 


Note 2: If the CPU is connected directly to the FPU and the CTTL loading is not violated, the CPU and FPU will function correctly together. The CPU and FPU 
connect directly without buffers. They should be located less than 4 inches (10 centimeters) apart. tspca and tspcig will track each other on all CPU’s and therefore 
it is not possible to have a minimum tgpcjg and a maximum tspca value. The pulse width minimum, tspcyw, of the FPU will not be violated by the NS32CG16 when 


connected directly to the FPU. 


G1-9LDOZESN/0L-91 DOCESN 
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Se 
oO ;o 


tTsOa 


ILO Si ji 
_trstoa 4-22 RSTO Signal Active after R.E., CTTL 


4.0 Device Specifications (Continued) 
4.4.3.1 Output Signals: Internal Propagation Delays, NS32CG16-10 and NS32CG16-15 (Continued) 


Name Description 


4-5 


m 
> | > 


tpBEa(w) 4-6 


BE Inactive after F.E., CTTL T4 


tDBEia 4-5, 4-6 


oO 
~ 
” 


| TSO Signai Active after R.E., CTTLT2 i2 2 10 ns 
trsoia_| 45 _| TSO Signal Inactive | 2 | o | 1 | ns 
tapa__| 4-5 __| RD Signal Active | 2 | | 5 | ns 
tRDia - RD Signal Inactive 20 ns 
twra - WRS Lo. | 15 ns 
twria | 4-6 | 20 15 | ns 
'pBEa(R) | 4-5 = 


ignal Active after R.E., CTTL T2 
: 4 


U/S Signal Hold after R.E., CTTL T4 


ctive (Write Cycle) after R.E., CTTL T2 


tusv__| 45 | U/SSignalvaid | afterRE,CTTLTA | 0 30__| ns 
tusn | 45 | U/SSignalHold 10 ns 
tersia_| 413. | PFSSignalinactive | afterFE,CTTL ns 
de to Next PFS Clock Cycle | 

| Next Nonsequential Fetch 


tLXPF 4-14 | Last Operand Transfer before R.E., CTTL T1 of | 
of an Instruction to First Bus Cycle of Transfer 


Next PFS Clock Cycle 


UILOs 


4-17 | ILO Signal Setup before R.E., CTTLT1 of | e 
| | | First Interlocked Write Cycle 
tiLon } 4-18 | ILO Signal Hold after R.E., CTTL T3 of Last 40 sie 
, i Interlocked Read Cycle 
tiLoa | | - ns 


ns 
21 


a 


ok ek 
O1 on 
= 
Q 
— 
Ze) 


ns 


— 
OV 


ns 


tRSTOia 4-22 RSTO Signal Inactive after R.E., CTTL 
tRTO! 4-22 Reset to Idle after F.E. of RSTO 
tRTOF 4-22 Reset to Fetch after R.E. of RSTO 


tcTp 


£ 
— 
<@) 


tcTp 


| ILO Signal Active after R.E., CTTL 


4.0 Device Specifications (Continued) 
4.4.3.2 Input Signal Requirements: NS32CG16-10 and NS32CG16-15 


xp OSCIN Clock Period R.E., OSCIN to Next R.E., OSCIN ns 
txh 4-20 | OSCIN High Time at 80% Voc (Both Edges) re i ae se 
(External Clock) 
tx OSCIN Low Time at20% Vcc (BothEdges) | 16 | | 10 | | ns 
tols | 4-5,4-11 | Data In Setup before R.E., CTTL T4 | i | | 5 | ns 
toin | 4-5, 4-11 | Data In Hold after R.E., CTTLT4 : 5 o 
(see Note 1) | 
tows | 4-5,4-6 | CWAIT Signal Setup beforeR.E.,CTTLT3orTaw) | 20 | | 20 | | ns 
town | 4-5,4-6 | CWAIT Signal Hold afterR.E.,CTTLT30rT3aw) | 5 | | 5 | | ns 
tws 4-5,4-6 | WAITn Signals Setup before R.E., CTTL T3 or T3(w) ae ae ns 
twn | 4-5,4-6 | WAITn Signals Hold afterR.E.,CTTLT30rTaw) | 5 | | 5 | | ns 
tHLDs | 4-7,4-8 | HOLD Setup Time before R.E., CT TL TX2 or Ti ro ao re ns 
tpn | 4-7,4-8 | HOLD Hold Time after R.E., CTTL Ti Lo oe ine 
tpwrR Power Stable to RST| R.E. | after Voc Reaches 4.5V so | | 3] | US 
tasts | 4-21, 4-22 | RSTI Signal Setup before F.E., CTTL | 20 | | 2 | | ns 
tastw RST! Pulse Width at 0.8V (Both Edges) ee ee 
tivts_|__ 423 _| INT Signal Setup before F.E., CTL 4 | | 14 | tctp—2ns| ns 
tiNTh INT Signal Hold after Interrupt Acknowledge Po ee tcTp 
tspcq | 4-12 SPC Pulse Delay after F.E., CTTL T4 | 49 40 ee 
from Slave | | | | | 
tspcs SPC Input Setup before F.E., CTL 37 | | x | ts 
tspcw 4-12 SPC Pulse Width at 0.8V (Both Edges) | on | 20 | aa 
(from Slave Processor) [ oe | 
taDss ADS Input Setup | before F.E,CTTL ee ee ee ee 
tapsh 4-9 ADS Input Hold after F.E., CTTL T1 10 40 ae 
(see Note 2) | | 
toons|__4-9 | DDINInput Setup —_—_| before F.E., CTTL is | | wo | [ns 


Note 1: tpi, is always less than or equal to trpia- 
Note 2: ADS must be deasserted before state T4 of the DMA controller cycle. 
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©| 4.0 Device Specifications (continued) 

zs 4.4.4 TIMING DIAGRAMS 

N 

a T4 OR Ti Ti OR Tt 

= CTTL q 

‘© ‘ ‘an | | th fr 

7 
(oar w 

. wows | —X = ss Cc ws 

a 

= tah 


“ = vaEc = THC 


Feta 
Ail anes Feta 


ADS 


DBE 


_ FIGURE 4-5. Read Cycle 
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4.0 Device Specifications (conti 


nued) | 
T4 OR Ti T1 T2 T3 T3(W) T3(W) T4 T1 OR Ti 
wows | | DR DR TT men TTX 
eT Tr 
wees | mi pg oe 
“Li Mm] | TT TTX 
otek 


“rel tT 


~ i) 
Pr 
“CDC 

ee 
[ PP 


= + 
pooh LL ivr 
TEND 


op 
Z _ Yt tly] 
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4.0 Device Specifications (Continued) 


HOLD 


ADS 
HBE ae Gp Gp GD TT Gb 2p 4 
DDIN (FLOATING) 
ADO=-AD15 == ap ap ap == ap ap ap @ 
(FLOATING) 
A16=A23 == ap ap ap ap ap ap ae @ 
(FLOATING) 


Paes TL/EE/9424-34 
FIGURE 4-7. HOLD Acknowledge Timing (Bus Initially Not Idle) 


Note: When the bus is not idle, HOLD must be asserted before the rising edge of CTTL of the timing state that precedes state T4 in order for the request to be 
acknowledged. 


4.0 Device Specifications (Continued) 


T4 OR Ti Ti Ti Ti Ti T4 OR Ti T1 OR Ti 


- LULL 
eames tHLDh 
HOLD 
t 

taaee HLDAta 
HLDA I | 

tapst tapsr 
ADS I ap Ge GD GD Gap @ 4 - ap ap Gap ap am 

tuber tuBer 


0) GD ors el) 
_ tone ee mE 
TT Neptpnp .] 


—_ 
wo [ pelt ND 


FIGURE 4-8. HOLD Timing (Bus Initially Idle) 
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4.0 Device Specifications (Continued) 


CPU STATES Ti Ti Ti 
DMAC STATES Ti T1 T2 
CTTL [ | 


Ti Ti 
14 T1 OR Ti 


WAIT1=2 


. . TL/EE/9424-36 
FIGURE 4-9. DMAC Initiated Bus Cycle 
Note 1: ADS must be deactivated before state T4 of the DMA controller cycle. 


Note 2: During a DMAC cycle WAIT1-2 must be kept inactive to prevent loss of synchronization. A DMAC cycle is similar to a CPU cycle. The NS32CG16 
generates TSO, RD, WR and DBE. The DMAC drives the address/data lines HBE, ADS and DDIN. 


Note 3: During a DMAC cycle, if the ADS signal is pulsed in order to initiate a bus cycle, the HOLD signal must remain asserted until state T4 of the DMAC cycle. 
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4.0 Device Specifications (Continued) 


CTTL I CTTL I 
toh 
Sos DATA OUT i ADO=-15 


: bal : 
: aM : ° ee 
[ STATUS VALID i NEXT CYCLE ST0=ST3 i STATUS VALID ) NEXT STATUS 


STATUS 
ADS I (HIGH) | ADS i (HIGH) 


TL/EE/9424-37 TL/EE/9424-38 
FIGURE 4-10. Slave Processor Write Timing FIGURE 4-11. Slave Processor Read Timing 


ADO=15 


ST0-ST3 


| | qn | 14 | | | 


CTTL [ 


SPC [ 
(FROM CPU) 


tspene 


(reow FPO) L 


TL/EE/9424-39 


FIGURE 4-12. SPC Timing 


After transferring the last operand to the FPU, the CPU turns OFF the 
output driver and holds SPC high with an internal 5 kO pullup. 


CTTL I 
o I 


t 
torse PFSia a 


as TL/EE/9424-40 
FIGURE 4-13. Relationship of PFS to Clock Cycles 
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4.0 Device Specifications (Continue) 


FIRST BUS CYCLE 
| v1 | T2 | T3 | 14 


TL/EE/9424~-41 


NS32CG16-10/NS32CG16-15 


Note: In a transfer of a Read-Modify-Write type operand, this is the Read transfer, displaying RMW Status (Code 1011). 
FIGURE 4-14. Relationship Between Last Data Transfer of an Instruction and PFS Pulse of Next Instruction 


| J om | 
=| LLL 


ST0-3 | x CODE 1001 


FIGURE 4-15. Guaranteed Delay, PFS to Non-Sequential Fetch 


TL/EE/9424-42 


le 7 bes Pt 
=f LLLP 
7s 
co | 
ml 


a TL/EE/9424-43 
FIGURE 4-16. Guaranteed Delay, Non-Sequential Fetch to PFS 


lectus ia tad «| 


_ TL/EE/9424-44 
FIGURE 4-17. Relationship of ILO to First Operand Cycle of an interlocked Instruction 
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4.0 Device Specifications (Continue) 


slo fe fa 


~ FPP 


T3 OR Ti T4 OR Ti 


=| 
eas TL/EE/9424-—45 
FIGURE 4-18. Relationship of ILO to Last Operand Cycle of an Interlocked Instruction 


“TLL LILES Lt 


TL/EE/9424-~46 


FIGURE 4-19. Relationship of ILO to Any Clock Cycle 


OSCIN 


FCLK 


tocy 


CTTL 


PHI1 


tNovL(2) tNovL(1) 


PHI2 


1 FO’ Fo’ Fo. Fd 


TL/EE/9424-—47 


FIGURE 4-20. Clock Waveforms 
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4.0 Device Specifications (Continued) 


CTTL I 


ae tpwr 
RSTI 
ST A TAREE 


— trsTs 


trsTOia 


RSTO 


FIGURE 4-21. Power-On Reset 


CTTL 


TL/EE/9424—48 


t 
testoa RSTOia 
Qesmeeremiocr meee 
RSTO 
trTo1 RIOF 
ADO=15, ) 


FIGURE 4-22. Non-Power-On Reset 
Note 1: During Reset the HOLD signal must be kept high. 
Note 2: After RSTI is deasserted the first bus cycle will be an instruction fetch at address zero. 


: FIGURE 4-23. INT Interrupt Signal Detection 
Note 1: Once INT is asserted, it must remain asserted until it is acknowledged. 
Note 2: INTA is the Interrupt Acknowledge bus cycle (not a CPU signal). Refer to Section 3.4.1 and Table 3.4. 


Poa tNMiw 
NMI 


TL/EE/9424-51 
FIGURE 4-24, NMI Interrupt Signal Timing 


64 


TL/EE/9424—49 


TL/EE/9424—50 


Appendix A: Instruction Formats 
NOTATIONS 


op = 


Integer Type Field 

B = 00 (Byte) 

W = 01 (Word) 

D = 11 (Double Word) 

Floating Point Type Field 

F = 1 (Std. Floating: 32 bits) 

L = 0 (Long Floating: 64 bits) 

Operation Code 

Valid encodings shown with each format. 


gen, gen 1, gen 2= General Addressing Mode Field 


reg = 
cond = 


short = 


See Sec. 2.2 for encodings. 
General Purpose Register Number 
Condition Code Field 
0000 = EQual: Z = 1 
0001 = Not Equal: Z = 0 
0010 = Carry Set: C = 1 
0011 = Carry Clear: C = 0 
0100 = Higher: L = 1 
0101 = Lower or Same: L = 0 
0110 = Greater Than: N = 1 
0111 = Less or Equal: N = 0 
1000 = Flag Set: F = 1 
1001 = Flag Clear: F = 0 
1010 = LOwer: L = 0 andZ = 0 
1011 = Higher or Same: L = 1 or Z = 1 
1100 = Less Than: N = 0 andZ = 0 
1101 = Greater or Equal: N = 1 o0rZ = 1 
1110 = (Unconditionally True) 
1111 = (Unconditionally False) 
Short Immediate value. May contain 


quick: Signed 4-bit value, in MOVQ, ADDQ, CMPQ, 


ACB. 
cond: Condition Code (above), in Scond. 
areg: CPU Dedicated Register, in LPR, SPR. 


0000 = US 

0001 — 0111 = (Reserved) 
1000 = FP 

1001 = SP 

1010 = SB 

1011 = (Reserved) 

1100 = (Reserved) 

1101 = PSR 

1110 = INTBASE 

1111 = MOD 


Options: in String Instructions 


_ uw |e} tr 


T = Translated 

B = _ Backward 

U/W = 00: None 
01: While Match 
11: Until Match 


Configuration bits in SETCFG instruction: 


po[mje}i 
7 


Bcond 


BSR 
RET 
CXP 
RXP 
RETT 
RETI 
SAVE 


RESTORE 


ADDQ 
CMPQ 
SPR 

Scond 


CXPD 


BICPSR 


JUMP 


BISPSR 


7 


Format 0 
(BR) 

Format 1 
— 0000 ENTER 
—0001 EXIT 
—0010 NOP 
—0011 WAIT 
—0100 DIA 
—0101 FLAG 
—0110 SVC 
—0111 BPT 


0 


1010 


0 
0010 
— 1000 
—1001 
—1010 
—1011 
~1100 
—1101 


—1110 
—1111 


Format 2 
—000 ACB — 100 
—001 MOVQ —101 
—010 LPR —110 


—011 


Format 3 
—0000 ADJSP 
—0010 JSR 
—0100 CASE 
—0110 


Trap (UND) on XXX1, 1000 


ADD 
CMP 
BIC 
ADDC 
MOV 
OR 


Format 4 
—0000 SUB 
—0001 ADDR 
—0010 AND 
—0100 SUBC 
—0101 TBIT 
—0110 XOR 


—1010 
—1100 
—1110 


— 1000 
— 1001 
— 1010 
— 1100 
—1101 
—1110 
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mppendlx A: oaaiueues — (Continued) 


tae ena 


Format § 


MOVS — 0000 BITWT — 1000 
CMPS —0001 TBITS — 1001 
SETCFG —0010 BBAND —1010 
SKPS —0011 SBITPS —1011 
BBSTOD —0100 BBFOR —1100 
EXTBLT —0101 SBITS —1101 
BBOR —0110 BBXOR —1110 
MOVMP —0111 

No Operation on 1111 


16/15 
Format 6 
ROT — 0000 NEG — 1000 
ASH —0001 NOT — 1001 
CBIT —0010 Trap (UND) —1010 
CBITI —0011 SUBP — 1011 
| Trap (UND) —0100 ABS —1100 
LSH —0101 COM —1101 
SBIT —0110 IBIT —1110 
SBITI —0111 ADDP —1111 
16/15 
Format 7 
MOVM — 0000 MUL — 1000 
CMPM —0001 ME] — 1001 
INSS —0010 Trap (UND) —1010 
EXTS —0011 DEI —1011 
MOVXBW —0100 QUO — 1100 
MOVZBW —0101 REM —1101 
MOVZiD | —0110 MOD —1110 
MOVXiD —0111 DIV —1111 


TL/EE/9424-—52 


Format 8 
EXT —0 00 INDEX —100 
CVTP —001 FFS —101 
INS —010 
CHECK —-011 


Trap (UND) on —1 10 and —1 11 


ecw ne ean reeerre 


Format 9 
MOVif — 000 ROUND — 100 
LFSR —001 TRUNC —101 
MOVLF —010 SFSR —110 
MOVFL —011 FLOOR —111 
7 0 
_ TL/EE/9424-53 

Format 10 

ee (UND) Always 


16/15 
Format 11 

ADDf —0000 Divf —1000 
MOVi —0001 (Note 1) —1001 
CMPf —0010 Trap (UND) —1010 
(Note 3) —0011 Trap (UND) ~1011 
SUBE —0100 MULE ~1100 
NEGf —0101 ABSf —1101 
Trap (UND) ~0110 Trap (UND) 1110 
—F (UND) —0111 Trap (UND) —1111 


16/15 
Format 12 
(Note 2) — 0000 (Note 2) — 1000 
(Note 1) —0001 (Note 1) — 1001 
POLYf —0010 Trap (UND) —1010 
DOTT —0011 Trap (UND) —1011 
SCALBf —0100 (Note 2) —1100 
LOGBf —0101 (Note 1) —1101 
Trap (UND) —0110 Trap (UND) —1110 
Trap (UND) —0111 Trap (UND) —1111 


~ *Instructions with Format 12 are available only when the NS323871 is used. 


7 
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© 


Format 13 
Trap (UND) Always 
00011110 
< TL/EE/9424-55 
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Appendix A: Instruction Formats (Continued) 


Trap (UND) 


Trap (UND) 


Trap (UND) 


Trap (UND) 


Trap (UND) 


Format 14 
Always 
TL/EE/9424-—56 
Format 15 
Always 
7 0 
TL/EE/9424-57 
Format 16 
Always 
ccc 0 
TL/EE/9424-58 
Format 17 
Always 
7 0 
10001110 
TL/EE/9424-59 
Format 18 
Always 


Trap (UND) 


TL/EE/9424-60 


Format 19 
Always 


implied Immediate Encodings: 


7 0 
r7 r6 r4 r3 r2 r1 r0 
Register Mask, appended to SAVE, ENTER 
7 0 
ro ri r3 r4 r5 r6 r7 


Register Mask, appended to RESTORE, EXIT 


7 0 


Offset/Length Modifier appended to INSS, EXTS 


offset 


Note 1: Opcode not defined; CPU treats like MOVf. First operand has access class of read; second operand has access class of write; f-field selects 32-bit or 


64-bit data. 


Note 2: Opcode not defined; CPU treats like ADDf. First operand has access class of read; second operand has access class of read-modify-write. f-field 
selects 32-bit or 64-bit data. 


Note 3: Opcode not defined; CPU treats like CMPf. First operand has access class of read; second operand has access class of read. f-field selects 32-bit or 


64-bit data. 
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Physical Dimensions inches (millimeters) 


0.050 = 0.800 
(1.270 = 20.32) 
16 SPACES AT 


: 0.050 = 0.800 
(20.98) (1.270 = 20.32) 
NOM 16 SPACES AT - 
0.330 
O “ER” O 
DIA NOM 
44 PEDESTAL 
43 27 
0.826 ; | 
(20.98) | 
NOM 


0.020 
(0.508) 
MIN 0.045 
0,104—0.118 - (1.143) 
(2.642 — 2.997) 


8.013—0.018 
(0. S30 6. 457) Dy 
| 
| 
| mt 


0.910—0.930 | 

(23.11 — 23.62) 
SQUARE 
CONTACT 

- DIMENSION 


if 


a 
| — 


_ 
a. 


0.032 0.040 
(0.813 — 1.016) 


0.005 —0.015 
(0.127 —0.381) 


Plastic Chip Carrier (V) 
Order Number NS32CG16V-10 or NS32CG16V-15 
NS Package Number V68A 


LIFE SUPPORT POLICY 


CIN 


| 0.045 


(1.143) 


0.950 
(24.13) 
REF SQ 
0.985 —0.995 
(25.02 — 25.27) 
SQUARE 


0.026 — 0.032 


(0.660 — 0.813) 
TYP 


0.165 —0.180 
(4.191 — 4.572). 


V6BA (REV G) 


NATIONAL’S PRODUCTS ARE NOT AUTHORIZED FOR USE AS CRITICAL COMPONENTS IN LIFE SUPPORT 
DEVICES OR SYSTEMS WITHOUT THE EXPRESS WRITTEN APPROVAL OF THE PRESIDENT OF NATIONAL 


SEMICONDUCTOR CORPORATION. As used herein: 


1. Life support devices or systems are devices or 
systems which, (a) are intended for surgical implant 
into the body, or (b) support or sustain life, and whose 
failure to perform, when properly used in accordance 
with instructions for use provided in the labeling, can 
be reasonably expected to result in a significant injury 
to the user. 


2. A critical component is any component of a life 
support device or system whose failure to perform can 
be reasonably expected to cause the failure of the life 
support device or system, or to affect its safety or 
effectiveness. 


National Semiconductor i i : 


National Semiconductor 
Corporation GmbH 
2900 Semiconductor Drive 
P.O. Box 58090 
Santa Clara, CA 95052-8090 
Tel: (408) 721-5000 
TWX: (910) 339-9240 


NS Japan Ltd 
Sanseido Bldg. 5F 
4-15 Nishi Shinjuku 
Shinjuku-Ku, 

Tokyo 160, Japan 
Tel: 3-299-7001 
FAX: 3-299-7000 


Westendstrasse 193-195 
D-8000 Munchen 21 
West Germany 

Tel: (089) 5 70 95 01 
Telex: 522772 


“A 


National Semiconductor 
Hong Kong Lid. 
Southeast Asia Marketing 
Austin Tower, 4th Floor 
22-26A Austin Avenue 
Tsimshatsui, Kowloon, H.K. 
Tel: 3-7231290, 3-7243645 
Cable: NSSEAMKTG 
Telex: 52996 NSSEA HX 


National Semicondutores 
Do Brasil Ltda. 

Av. Brig. Faria Lima, 830 

8 Andar 

01452 Sao Paulo, SP. Brasil 
Tel: (55/11) 212-5066 

Telex: 391-1131931 NSBR BR 


National Semiconductor 
(Australia) PTY, Ltd. 
21/3 High Street 
Bayswater, Victoria 3153 
Australia 

Tel: (03) 729-6333 
Telex: AA32096 


National does not assume any responsibility for use of any circuitry described, no circuit patent licenses are implied and National reserves the right at any time without notice to change said circuitry and specifications. 
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Semiconductor 


NS32332-10/NS32332-15 


PRELIMINARY 


32-Bit Advanced Microprocessors 


General Description 


The NS32332 is a 32-bit, virtual memory microprocessor 
with 4 GByte addressing and an enhanced internal imple- 
mentation. It is fully object code compatible with other Se- 
ries 32000® microprocessors, and it has the added features 
of 32-bit addressing, higher instruction execution through- 
put, cache support, and expanded bus handling capabilities. 
The new bus features include bus error and retry support, 
dynamic bus sizing, burst mode memory accessing, and en- 
hanced slave processor communication protocol. The high- 
er clock frequency and added features of the NS32332 en- 
able it to deliver 2 to 3 times the performance of the 
NS32032. 


The NS32332 microprocessor is designed to work with both 
the 16- and 32-bit slave processors of the Series 32000 
family. 


Block Diagram 


GY 


4 20-BYTE 
Za QUEUE 


0777777222202 
hhh 


j INSTRUCTION | 
| DECODER 


f DISPLACEMENT AND § 
IMMEDIATE EXTRACTOR 


% : 


Features 
m 32-bit architecture and implementation 
m 4 Gbyte uniform addressing space 
m Software compatible with the Series 32000 Family 
m Powerful instruction set 
— General 2-address capability 
— Very high degree of symmetry 
— Address modes optimized for high level languages 
m Supports both 16- and 32-bit Slave Processor Protocol 
— Memory management support via NS32082 or 
NS32382 
— Floating point support via NS32081 or NS32381 
m Extensive bus feature 
— Burst mode memory accessing 
— Cache memory support 
— Dynamic bus configuration (8-, 16-, 32-bits) 
— Fast bus protocol 
m High speed XMOS™ technology 
m 84 Pin grid array package 


ADD/DATA CONTROLS & STATUS 


ADDRESS SHIFTER 


| gat, ADDRESS REG 7 
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FIGURE 1 


*Shaded areas indicate enhancements from the NS32032. 
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NS32532-20/NS32532-25/NS32532-30 


rT National | PRELIMINARY 


“4 Semiconductor 


NS32532-20/NS32532-25/NS32532-30 
High-Performance 32-Bit Microprocessor 


General Description Features 

The NS32532 is a high-performance 32-bit microprocessor ™@ Software compatible with the Series 32000 family 
in the Series 32000® family. It is software compatible with 32-bit architecture and implementation 

the previous microprocessors in the family but with a greatly 4-GByte uniform addressing space 


enhanced internal implementation. m On-chip memory management unit with 64-entry 
The high-performance specifications are the result of a four- _translation look-aside buffer 

stage instruction pipeline, on-chip instruction and data 4-Stage instruction pipeline 

caches, on-chip memory management unit and a signifi- 512-Byte on-chip instruction cache 

cantly increased clock frequency. In addition, the system 1024-Byte on-chip data cache 

interface provides optimal support for applications spanning High-performance bus 

a wide range, from low-cost, real-time controllers to highly — Separate 32-bit address and data lines 
sophisticated, general purpose multiprocessor systems. — Burst mode memory accessing 


The NS32532 integrates more than 370,000 transistors fab- — Dynamic bus sizing 
ricated in a 1.25 wm double-metal CMOS technology. The 
advanced technology and mainframe-like design of the de- 
vice enable it to achieve more than 10 times the throughput 
of the NS32032 in typical applications. 

In addition to generally improved performance, the 
NS32532 offers much faster interrupt service and task 
switching for real-time applications. 


Extensive multiprocessing support 

Floating-point support via the NS32381 or NS32580 
1.25 wm double-metal CMOS technology 

175-pin PGA package 


Block Diagram 


4= STAGE 
INSTRUCTION PIPELINE 


Co eee 
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ee 
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FIGURE 1 
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Section 2 
Peripherals 


Complete specifications for devices referenced in this 
section can be found in the 1988 Series 32000 Data- 
book. 


VV National — 
#4 Semiconductor 


| NS3208 1-10/NS3208 1-15 Floating-Point Units 


| General Description 


The NS32081 Floating-Point Unit functions as a slave proc- 
essor in National Semiconductor’s Series 32000® micro- 
processor family. It provides a high-speed floating-point in- 
struction set for any Series 32000 family CPU, while remain- 
ing architecturally consistent with the full two-address archi- 
tecture and powerful addressing modes of the Series 32000 
micro-processor family. 


Block Diagram 


poco oo ee eK eK ee 


MICRO ff 
SEQUENCER 


ios tee 


A Condition and 
Bl adl 


Eg 
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ENTRY 
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Features 


FRACTION 
SEQUENCER 


Eight on-chip data registers 
32-bit and 64-bit operations 


Supports proposed IEEE standard for binary floating- 
point arithmetic, Task P754 


Directly compatible with NS32016, NS32008 and 
NS32032 CPUs 


High-speed XMOS™ technology 
Single 5V supply 
24-pin dual in-line package 


CONTROL UNIT | 


initiate 
Sequence 


 execurion unit 
| 
| 


| ivcneace ane’ 
| STORAGE UNIT ' 


? Data Bus _ ay 


contol Bus 


TL/EE/5234-1 | 
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NS32381-15/NS32381-20 


National 


Semiconductor 


PRELIMINARY 


NS3238 1-15/NS32381-20 Floating-Point Unit 


General Description 


The NS32381 is a second generation, CMOS, floating-point 
slave processor that is fully software compatible with its 
forerunner, the NS32081 FPU. The NS32381 FPU functions 
with any Series 32000 CPU, from the NS32008 to the 
NS32532, in a tightly coupled slave configuration. The per- 
formance of the NS32381 has been increased over the 
NS32081 by architecture improvements, hardware en- 
hancements, and higher clock frequencies. Key improve- 
ments include the addition of a 32-bit slave protocol, an 
early done algorithm to increase CPU/FPU parallelism, an 
expanded register set, an automatic power down feature, 
expanded math hardware, and additional instructions. 


The NS32381 FPU contains eight 64-bit data registers and 
a Floating-Point Status Register (FSR). The FPU executes 
20 instructions, and operates on both single and double- 
precision operands. Three separate processors in the 
NS32381 manipulate the mantissa, sign, and exponent. The 
NS32381 FPU conforms to IEEE standard 754-1985 for bi- 
nary floating-point arithmetic. 

When used with a Series 32000 CPU, the CPU and 
NS32381 FPU form a tightly coupled computer cluster. This 
cluster appears to the user as a single processing unit. All 
addressing modes, including two address operations, are 


FPU Block Diagram 


Condition 
and 
Completion 


7 EXPONENT 
PROCESSOR | 


available with the floating-point instructions. In addition, 
CPU and FPU communication is handled automatically, and 
is user transparent. 

The FPU is fabricated with National’s advanced double-met- 
al CMOS process. It is available in a 68-pin Pin Grid Array 
(PGA) package. 


Features 
m Directly compatible with NS32008, NS32016, 
-NS32C016, NS32032, NS32C032, NS32332 and 


NS32532 microprocessors 
m Selectable 16-bit or 32-bit Slave Protocol 
m Conforms to IEEE standard 754-1985 for binary float- 
ing-point arithmetic 
g@ Early done algorithm 
m Single (32-bit) and double (64-bit) precision operations 
m@ Eight on-chip (64-bit) data registers 
m (Automatic) power down mode 
m@ Full upward compatibility with existing 32000 software 
m@ High speed double-metal CMOS design 
m@ 68-pin PGA package 


ENTRY OPCODE 


POINT DECODE J yon” 
i GENERATOR REGISTER 
Initiate 
| Sequence | 
Execution 
Unit 


Interface 
and ; 
Storage Unit 


__CONTROL BUS , 
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FIGURE 1-1 


National 
Semiconductor 


NS32202-10 Interrupt Control Unit 


General Description Features 


The NS32202 Interrupt Control Unit (ICU) is the interrupt ™ 16 maskable interrupt sources, cascadable to 256 
controller for the Series 32000® microprocessor family. Itis | ™ Programmable 8- or 16-bit data bus mode 

a support circuit that minimizes the software and real-time m™ Edge or level triggering for each hardware interrupt with 
overhead required to handle multi-level, prioritized inter- individually selectable polarities 

rupts. A single NS32202 manages up to 16 interrupt sources, 8 software interrupts 

resolves interruptpriorities, andsuppliesasingle-byteinterrupt m= Fixed or rotating priority modes 


vector to the CPU. m Two 16-bit, DC to 10 MHz counters, that may be con- 
The NS32202 can operate in either of two data bus modes: catenated into a single 32-bit counter 


16-bit or 8-bit. In the 16-bit mode, eight hardware and eight 


be m Optional 8-bit |/O port available in 8-bit data bus mode 
software interrupt positions are available. In the 8-bit mode, m High-speed XMOS™ technology 
16 hardware interrupt positions are available, 8 of which can 
be used as software interrupts. In this mode, up to 16 addi- . single, - ad supply 
tional ICUs may be cascaded to handle a maximum of 256 ™ 40-pin, dual in-line package 
interrupts. 
Two 16-bit counters, which may be concatenated under pro- 
gram control into a single 32-bit counter, are also available 
for real-time applications. 
Basic System Configuration 
INT ¢-— 
“¢-——~ | NON-CASCADED 
NS32016 MASTER e INTERRUPT SOURCES 
CPU NS32202 [qumm 


GROUP ICU 


® 
CASCADED | ° 
NS32202 | 
(cu : 
e 
e 
e 
cc 
CASCADED 
INTERRUPT 


SOURCES 


INT 


CASCADED 
s ~6NS32202 
ICU 


ae 


TL/EE/5117-1 
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NS32203-10 


National - 
Semiconductor 


PRELIMINARY 


NS32203-10 Direct Memory Access Controller 


General Description 


The NS32203 Direct Memory Access Controller (DMAC) is 
a support chip for the Series 32000® microprocessor family 
designed to relieve the CPU of data transfers between 
memory and I/O devices. The device is capable of packing 
data received from 8-bit peripherals into 16-bit words to re- 
duce system bus loading. It can operate in local and remote 
configurations. In the local configuration it is connected to 
the multiplexed Series 32000 bus and shares with the CPU, 
the bus control signals from the NS32201 Timing Control 
Unit (TCU). In the remote configuration, the DMAC, in con- 
junction with its own TCU, communicates with I/O devices 
and/or memory through a dedicated bus, enabling rapid 
transfers between memory and I/O devices. The DMAC 
provides 4 16-bit |/O channels which may be configured as 
two complementary pairs to support chaining. 


Block Diagram 


BUS INTERFACE LOGIC 
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Features 
@ Direct or Indirect data transfers 


m Memory to Memory, I/O to !/O or Memory to I/O 
transfers 

m@ Remote or Local configurations 

m@ 8-Bit or 16-Bit transfers 

m@ Transfer rates up to 5 Megabytes per second 

= Command Chaining on complementary channels 

m@ Wide range of channel commands 

m@ Search capability 

@ Interrupt Vector generation 

mg Simple interface with the Series 32000 Family of 
Microprocessors 

m High Speed XMOS™ Technology 

m Single +5V Supply 

m 48-Pin Dual-In-Line Package 


TL/EE/8701 —1 
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Section 3 
Development Tools 


Complete specifications for devices referenced in this 
section can be found in the 1988 Series 32000 Data- 
book. 


ZA National Semiconductor 


SYS32/20 PC Add-in Development Package 


m High Performance, 10 MHz, no-wait state, 
32-bit expansion board for an 
IBM-PC/AT or compatible system 

m An Operating System derived from 
AT&T’s UNIX® System V.3 

m The Series 32000 GNX (GENIX Native 
and Cross-Support) Language tools 
including the Series 32000 assembler, 
linker, monitors and debuggers 

m Hardware that supports the NS32032 
CPU, NS32082 MMU, NS32201 TCU and 
the NS32081 FPU 

m Two available on-board memory 
configurations: 

— 2-Mbyte RAM 
— 4-MByte RAM 


Description 


National Semiconductor’s SYS32/20 is a complete, 
high performance development package that converts 
an IBM-PC/AT or compatible system into an ideal en- 
vironment for the support of Series 32000®-based ap- 
plications. The SYS32/20 PC Add-In Development 
Package allows mainframe-size programs to run on a 
personal computer at speeds similar to those of a 


79 


TL/C/9250-1 


m Software available on 1.2-MByte floppies 
m= Complete support for the following 
application tools: 
— SPLICE 
— National’s Series 32000 Development 
Board Family 
— Compilers for C, FORTRAN/77, Pascal 
and Ada 
— Complete System V Documentation 
— 4.2 “bsd”’ Utilities 
— Tools for Documentors (TFD), a 
derivative of AT&T’s DWB™ utilities 
— Multiuser environment 


VAX 780. The SYS32/20 consists of a 32-bit PC Add- 
In board based on the Series 32000 chip set, a com- 
plete port of AT&T’s UNIX® System V.3 specially de- 
veloped software that integrates the UNIX and DOS 
operating systems, and National’s Series 32000 de- 
velopment tools (GNX). | 


| 
| 
! 


i 
: 
| 
i 
i 
i 
i 


0Z/ZESAS 


SYS32/30 


ZA National Semiconductor 


SYS32/30 PC-Add-In 
Development Package 


WS SS 
\\ 
SS 


m@ 15 MHz NS32332/NS32382 Add-In board 
for an IBM® PC/AT® or compatible 
system 

m 2-3 MIP system performance 

m No wait-state, on-board memory in 4-, 8- 
or 16-Mbyte configurations 

m Operating system derived from AT&T’s 
UNIX® System V Release 3 

m@ Multi-user support 

m GENIXT™ Native and Cross-Support 
(GNX) language tools. Includes— 
assembler, linker, libraries, debuggers 


Product Overview 
The SYS32™/30 is a complete, high-performance 


aeveiopment package that converis an iBM PC/AT or 
compatible computer into a powerful multi-user sys- 
tem for developing applications that use National 
Semiconductor Series 32000 microprocessor family 
components. The SYS32/30 add-in processor board 


containing the Series 32000 chip cluster with the 


NS32332 microprocessor allows programs to run ona 


personal computer at speeds greater than those of a 


TL/EE/9420-~1 


Support for other Series 32000® 

development products: 

— SPLICE 

— National’s Series 32000 Development 
Board family 

— Compilers: C, FORTRAN77, Pascal, 
Ada® 

m Easy to use DOS/UNIX interface 


VAX™ 780. The chip cluster on the processor board 
32332 Central Processing Unit, 


NS32382 Memory Management Unit, NS32C201 Tim- 
ing Control Unit and the NS32081 Floating-Point Unit. 


Along with the processor board, the SYS32/30 pack- 
age contains the Opus5™ operating system. This op- 
erating system is a port of AT&T’s UNIX System V 
Release 3, and is derived from GENIX V.3, National 
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SPLICE Development Tool 


= Download capabilities via serial 
connections 
m 256 Kbytes of mappable memory 
m Optional 1-Mbyte memory board, 
expands memory up to 8 Mbytes 
m On-board monitor with power-on 
diagnostics 
m Supports Series 32000 CPUs, 
including: NS32332 NS32CG16 
NS32032 NS32C032 
NS32016 NS32C016 
NS32008 


1.0 Product Overview 
The SPLICE Development Tool provides a communi- 


cation link between a Series 32000 target and a devel- 


opment system host. This connection allows users to 
download and map their software onto target memory 
and then debug this software using National Semicon- 
ductor’s debuggers. 

SPLICE includes two RS232 serial ports for the sys- 
tem host/terminal. These ports are particularly useful 
for target systems that have no serial ports, such as 
embedded controller designs. 
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Parallel I/O port reserved for future 

highspeed download capabilities 

m Programmable serial port baud rates 

m CPU bus status test points for logic 
analyzer connections 

m 4LED indicators for diagnostic results 
and general user applications 

m@ RESET and NMI push buttons 

m@ 15 MHz maximum operation 


SPLICE is also useful for designs with ROM-based 
software, or designs whose memory portion has not 
yet been built. SPLICE provides 256 Kbytes of SRAM 
which users can map into target memory. Using 
mapped memory considerably reduces software de- 
velopment time. 

SPLICE also uses the target system’s chipset. This 
cost-effective feature is achieved through the use of 
CPU and MMU target cables. 
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Section 4 
Software 


Complete specifications for devices referenced in this 
section can be found in the 1988 Series 32000 Data- 
book. 


ZA National Semiconductor 


Series 32000® GENIX™ Native and 
Cross-Support (GNX™) Language Tools 
(Release 2) 


SOURCE weoce 
(C) COMPILER 
LS 
SOURCE peice 
(FORTRAN) COMPILER 
. Hy 2 
SOURCE 
(PASCAL) 


SOURCE 
(ASSEMBLY) 


NASM 
ASSEMBLY CROSS 
SOURCE ASSEM= 
BLER 
NMPC 
CROSS 
COMPILER 


mg Implements AT&T’s standard Common 
Object File Format (COFF) 

Optimizing C Compiler (optional) 
Optimizing FORTRAN 77 Compiler 
(optional) 

Pascal Compiler (optional) 

Series 32000 assembler and linker 
In-System Emulator Support 

Interactive remote debugger with helpful 
command interface 


Product Overview 


The Series 32000 GNX Language Tools are a set of 
software development tools for the Series 32000 mi- 
croprocessor family. Optional high-level language 
compilers work in conjunction with the standard com- 
ponents to provide tools that can be combined to 
meet a variety of development needs. 


GENIX Native and Cross-Support (GNX) 
Language Tools 


The Series 32000 GNX Language Tools are based on 
AT&T’s Common Object File Format (COFF). With ap- 


OBJECT 
MODULE 
LIBRARIAN 
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m@ Available in binary for the VAX™ UNIX® 
4.3 bsd operating system under 
derivatives of the Berkeley operating 
system 

m@ Available in binary for the VAX/VMS™ 
operating system 

@ Available in binary on National 
Semiconductor Series 32000 Systems 

m Available in source for porting to other 
operating system environments 


propriate command-line arguments and when linked 
with appropriate libraries, code generated by the GNX 
language tools can be executed in any Series 32000 
target environment. In addition, these tools can be 
used to develop operating-system-independent code 
or code designed to run in conjunction with real-time 
kernels, such as National’s EXEC and VRTX®/Series 
32000. All of National’s new language tools conform 
to the COFF file format, thereby ensuring that moa- 
ules produced by any one set of tools can be linked 
with objects produced by any other set of GNX tools. 
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GENIX™/V.3 Operating System 


MULTI = USER, 
MULTI = TASKING, 
AND ASSIST 


TRANSPORT LEVEL 
INTERFACE AND 
TRANSPORT PROVIDER 
INTERFACE 


REMOTE 
FILE SHARING 
SHARED 
LIBRARIES 


BROAD SPECTRUM 
OF PROGRAM 
APPLICATIONS 


u Derived from AT&T’s System V, Release 
3.0, UNIX® Operating System 

m= Demand-paged Virtual Memory 

= Mandatory and Advisory File and Record 
Locking 

m” Streams 


General Description 


GENIX/V.3 is a port of AT&T’s System V, Release 
3.0, UNIX operating system for the Series 32000® mi- 
croprocessor family. GENIX/V.3 is available in source 
form and can be adapted to serve as the operating 
system on customer-designed Series 32000-based 
systems. 


GENIX/V.3 is a multitasking, multiuser operating sys- 
tem that provides an abundance of programs and utili- 
ties for text processing, program development, and 
system administration. GENIX/V.3 supports a wide 
variety of applications ranging from databases to 
graphics packages available from independent soft- 
ware vendors. 


GENIX/V.3 carries forward all of the enhancements 
from Systems V/Series 32000, such as demand- 


END=USER 
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m@ Transport Level Interface and Transport 
Provider Interface 

Remote File Sharing 

Shared Libraries 

Assist 

C Compiler and Associated Language 
Tools 


paged virtual memory and file and record locking, 
while introducing significant new features that support 
local area networking. 


GENIX/V.3 Features 


Streams 


Streams is a general, flexible facility for the develop- 
ment of communications services within the UNIX op- 
erating system. Streams provides a consistent frame- 
work for the operation of network services (ranging 
from local area networks to individual device drivers) 
under the UNIX kernel. 


ZA National Semiconductor 


Series 32000® Real-Time Software 
Components VRTX, lOX, FMX and TRACER 


VRTX/Series 32000 R&D Package 


APPLICATION PROGRAM 


TASK MANAGEMENT, 
COMMUNICATION AND 
SYNCHRONIZATION, AND 
MEMORY ALLOCATION 


m Real-time executive for Series 32000 
embedded systems 

w Can be installed in any Series 32000 
hardware environment 

m Manages multitasking with priority-based 
scheduler 

m Manages memory pool, mailboxes, 
timing and terminal I/O 

m Can reside in PROM and be located 
anywhere in memory 


The VRTX®/Series 32000 executive is the central 
member of a set of silicon software building blocks 
used in Series 32000-based real-time embedded sys- 
tems. The executive manages the multitasking envi- 
ronment and responds to operating system service re- 
quests from application tasks. 


The executive can be used alone or in combination 
with the other silicon software components to build a 
more complete operating system. The lOX®/Series 
32000 and FMX®/Series 32000 components support 
a file system that is media-compatible with PC-DOS. 
The TRACERTM/Series 32000 is an interactive multi- 


BASIC SYSTEM |/O SYSTEM 
CALL HANDLERS CALL HANDLERS 
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m No requirements for particular timers, 
interrupts or busses 

m@ Has hooks at key processing points for 
easy customization 

m Comprehensive manuals with many 
examples 

# Hot-line technical support 

m Integrated with interactive multitasking 
debugger (optional) 

u Integrated with PC-DOS compatible file 
system (optional) 


tasking debugger that can be used in VRTX-based 
systems for debug, download and test. 


All the components can reside in PROM’s installed in 
the target system. They can be placed anywhere in 
the address space and make minimal assumptions 
about the hardware environment. Small user-written 
routines supply information about the local implemen- 
tation of interrupts, timers, |/O devices, etc. Applica- 
tion tasks interface to the components with Series 
32000 SVC (Supervisor Call) interrupts, thus code for 
the components does not require linking with user- 
written code. 
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Series 32000® EXEC 
ROMabie Real-Time Multitasking 
EXECUTIVE 


TIMER 
MANAGER 
DYNAMIC TASK | 
DISPATCHER 
C 
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U 
| DYNAMIC 
CHANNEL | 
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MEMORY 
POOL 
MANAGER 


m@ Provides a multitasking executive for 
real-time applications 

m= Supports all Series 32000 CPUs 

m Complete Source Code Package 
— Fully user configurable 
— Hardware independent 

m Extensive user implementation support 
— Unique demo, program introduction 
— C and Pascal interface libraries 
— Sample terminal drivers 


— Integrated with Series 32000 
development boards and monitor 


Product Overview 


EXEC is National Semiconductor’s real-time, multi- 
tasking executive for Series 32000 based applica- 
tions. Its primary purpose is to simplify the task of de- 
signing application software and provides a base upon 
which users can build a wide range of application sys- 
tems. EXEC requires only 2K bytes of RAM and only 
4K bytes of ROM and is fully compatible with National 
Semiconductor’s Series 32000 family and the Series 
32000 development board family. 


EXEC allows the user to monitor and control multiple 
external events that occur asynchronously in real- 
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ROMablie 

Reconfigurable 

Real-time clock support for time-of-day 

and event scheduling 

m Allows up to 256 levels of task priority 
which can be dynamically assigned 

m@ Up to 256 logical channels for task 
communication 

m= Free-memory pool control 

m@ Available for VAX™/VMS™, VAX/UNIX® 

and SYS32T™ development environments 


time, such as intertask communications, system re- 
source access based upon task priority, real-time 
clock control, and interrupt handling. These functions 
greatly simplify application development in such areas 
as instrumentation and control, test and measure- 
ment, and data communications. In these applica- 
tions, EXEC provides an environment in which sys- 
tems programmers can immediately implement soft- 
ware for their particular application without regard to 
the details of the system interaction. 
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section 5 
Application Notes 


Line Drawing with the 
NS32CG16; NS32CG16 
Graphics Note 5 


1.0 INTRODUCTION 


The Bresenham algorithm, as described in the “Series 
32000® Graphics Note 5” is a common integer algorithm 
used in many graphics systems for line drawing. However, 
special instructions of the NS32CG16 processor allow it to 
take advantage of another faster integer algorithm. This ap- 
plication note describes the algorithm and shows an imple- 
mentation on the NS32CG16 processor using the SBITS 
(Set BIT String) and SBITPS (Set BIT Perpendicular String) 
instructions. Timing for the DRAW__LINE algorithm is given 
in Tables A, B and C of the Timing Appendix. The timing 
from the original Bresenham iterative method using the 
NS32CG16 is given in Table D. 

The bit map memory conventions followed in this note are 
the same as those given in the NS32CG16 Reference Man- 
ual and Datasheet, and all lines drawn are monochrome. 
Series 32000 Graphics Note 5, AN-524, is recommended 
reading. 


2.0 DESCRIPTION 

All rasterized lines are formed by sequences of line “‘slices”’ 
which are separated by a unit shift diagonal to the direction 
of these slices. For example, the line shown in Figure 7 is 
composed of 7 slices, each slice separated by a unit diago- 
nal shift in the positive direction. Notice that the slices of the 
line vary in length. The algorithm presented in this note de- 
termines the length of each slice, given the slope and the 
endpoints of the line. 


Depending on the slope of the line, these slices will extend 
along the horizontal axis, the vertical axis or the diagonal 
axis with respect to the image plane (i.e., a printed page or 
CRT screen). If the data memory is aligned with the image 
plane so that a positive one unit horizontal (x-axis) move in 
the image plane corresponds to a one bit move within a byte 
in the data memory, and so that a positive one unit vertical 
(y-axis) move in the image plane corresponds to a positive 
one “‘warp” (warp = the number pixels along the major axis 
of the bit map) move within the data memory, then the 
SBITS and SBITPS instructions can be used to quickly set 
bits within data memory to form the line slices on the image 
plane, as explained in section 3.1. For long horizontal lines, 
the MOVMP (MOVe Multiple Pattern) instruction is more ef- 
ficient than SBITS. This instruction is discussed in section 
3.1 and in the NS32CG16 Reference Manual. 


2.1 Derivation of the Bresenham SLICE Algorithm 

For the moment, consider only those lines in the X-Y coordi- 

nate system starting at the origin (0,0), finishing at an inte- 
(0,0) 
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ger end point (x,y) and lying in the first partial octant, as in 
Figure 2. (The analysis will be extended for all lines in sec- 
tion 2.2.) The equation for one such line ending at (A,B) is: 
y = mx, 
where 
m= B/A 
is the slope of the line. Note that because the line lies in the 
first partial octant, A > 2B = 1. 
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FIGURE 2 


Each pixel plotted can be thought of as a unit square area 
on a Real plane (Figure 3). Assume each pixel square is 
situated so that the center of the square is the integer ad- 
dress of the pixel, and each pixel address is one unit away 
from its neighbor. Then let Aj represent the X-coordinate of 
the pixel, as shown in Figure 3. The value of Y at A; is: 


7 y = (B/A)A; 
where y is Real. 


Since the address of each pixel plotted must have corre- 
sponding integer coordinates, the closest integer to y is ei- 
ther the upper bound of y or the lower bound. (Recall that 
upper and lower bounds refer to the smallest integer greater 
than or equal to y and the largest integer less than or equal 
to y respectively.) The original Bresenham algorithm was 
based on this concept, and had a decision variable within 
the main loop of the algorithm to decide whether the next 
Yi+1 was the previous yj; (lower bound) or yj + 1 (upper 
bound). For the SLICE algorithm, we are only concerned 
with when the value changes to y; + 1, and the length of 
the previous slice up to that point. 


(45,0) 
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The line from (0,0) to (45,6) is a first octant line with run lengths 3-7-6-7-6-7-3. Notice that a pixel is plotted before the run begins so that the actual number of 


FIGURE 1 


| pixels plotted is equivalent to the run length + 1. 
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Y is incremented when the location of the half point is beyond Aj, or when 
the true value of Y at Aj+ 4 is greater than Yj + 2. 


FIGURE 3 
In order for y; to be incremented along the Y-axis, the true 
value of real y at Aj + 1 must be greater than or equal to the 
halfway point between y; and y; + 1 (Figure 3). If we let i 
increment along the Y-axis, then this half point occurs 
when: 
y= 1/2 + y; 
Or, because y; = i when incrementing along the Y-axis, 
y = (1 + 2i)/2. 
The real value of x at this point is: 
x = A(1 + 2i)/2B 
using x = (1/m)y. The lower bound of this value of x repre- 
sents the x-coordinate of the pixel square containing the 
half point. 
Letting Aj and Aj+4 be two integer values of x where the 
real value of y is greater than or equal to the half point value 


-y; + 1/2 (Figure 4), then the run length extends from (Aj + 


1, i + 1) to (Aj44, i + 1). The run length can then be 
calculated as: 

Hit 4 = Ait+a — Aj 1 
fori = 0,1, ... ,(B-2). Using the equation for x above, we 
can now better define Aj as: 

A; = (A/2B) + (iA/B). 
This equation has two real-valued divisions which are not 
suitable for an integer algorithm. However, the equation can 
be broken down so that it only involves an integer-valued 
division and its integer remainder, which is more efficient for 
processing. To do this we must define some intermediary 
integer values: 


Q = lower[A/B] { Lower bound of inverted slope} 


R= a)/A {Integer residue of A modulo B} 
M = lower[A/2B] {Can also be defined as Q/2} 

N = oplA {Integer residue of A modulo 2B} 
Tj = opl(N+2iR) { Integer residue of (N + 2iR) 


modulo 2B} 
Note: ,iS = B + A *lower[A/B]. 


TL/EE/9663-4 
— Aj—1. In this example, the run length 


FIGURE 4 


Run length is calculated as Aj+ 4 
is 1. 
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Using the above values we can now define A; as, 
A; = (M + N/2B) + (iQ + iR/B) 
Aj = M+iQ + (N+ 2iR)/2B 
Therefore, substituting Aj and Aj 7 into the equation for 
H;+4, the intermediate horizontal lengths are, 
Hi+d = Aja — Ai 1 
Hi+4 = {M+ (i+1)Q + lower[(N + 2(i+1)R)/2B]} — 
{M + iQ + lower[(N + 2iR)/2B]} — 
Hi+4 = Q + lower[(N + 2iR)/2B + 2R/2B] — 
lower[(N + 2iR)/2B] — 1 
Hi+4 = Q — 1 + lower[(T; + 2R)/2B] 
Analyzing the term lower[(T; + 2R)/2B] it is shown that if T; 
+ 2R = 2B then the term becomes 1, otherwise it becomes 


0. This is due to the definition of residue and modulo. The 
term Tj is defined as: 


(N + 2iR) — 2B(lower[(N + 2iR)/2B)), 
which means that O < T; < 2B. The same is true for R: 
R = A — Bilower[A/B)), 
so thatO < 2R < 2B. Therefore, 
0<Tj + 2R < 4B 
and, 
O < (Tj + 2R)/2B < 2. 


The only possible integer values for this term are 0 and 1. 
The term will equal 0 if Tj + 2R < 2B, and it will equal 1 
when Tj + 2R = 2B, and Hj, will equal Q. The decision 
variable can now be defined as 


testvar = T; + 2R—2B. 


lf testvar = O then the horizontal run length is Q; if testvar < 
0 then the run length is Q—1. 


Looking again at the definition of Tj, a recursive relationship 
for the testvar can be formed. 


Tj44 = (N + 2R(i+1)) — 2B(lower[(N + 2R(i+ 1))/2B] 


Tj44 = (N + 2iR + 2R) — 2B(lowerl[(N + 2iR + 
2R)/2B]) 


Since, as shown above, 0 < (T; + 2R)/2B < 2 then 
lower[(T; + 2R)/2B] < 1. In fact, if T) + 2R < 2B then 
lower[(T; + 2R)/2B] = 0, and if T; + 2R = 2B then 
lower[(T; + 2R)/2B] = 1. Therefore, letting To = N, 


Tj44 = 7) + 2R_ if (1; + 2R) < 2B 
Tj41 = 7) + 2R-2B if (7; + 2R) = 2B. 
This gives the recursive relationship for testvar: 
testvar ;+ 4 = testvar; + 2R 
Hj, =Q-1 
if testvar ; < 0. And, if testvar ; = 0: 
testvar ;+ 4 = testvar; + 2R—2B 
H) = Q. 
These recursive equations allow the intermediate run 


lengths to be easily calculated using only a few additions 
and compare-and-branches. 


The initial run length is calculated as follows: 
Ho = Ao = lower[A/2B] = M + lower[N/2B] = M 
The final run length is similarly calculated as: 
Hy = M—1 ifN = Oelse Hy = M. 


Thus, the SLICE algorithm calculates the horizontal run 
lengths of a line using various parameters based on the first 
partial octant abscissa and ordinate of the line. The algo- 
rithm is efficient because it need only execute its main loop 
B times, which is a maximum of A/2, if A is normalized for 
the first partial octant. Compare this with the original Bre- 
senham algorithm which always executes its main loop A 
times. 


2.2 Extended Analysis for All Other Lines 


In section 2.1 the SLICE algorithm was derived for lines 
starting at the origin and contained within the first octant (B 
< 2A). The algorithm is easily extended to encompass lines 
in all octants starting and ending at any integer coordinates 
within the pre-defined bit map. The only modifications nec- 
essary for this extension are those relating to the direction 
of movement and in defining the coordinates A and B. 


In order to extend the algorithm to cover all classes of lines, 
the key parameters used by the algorithm must be normal- 
ized to the first partial octant. Those parameters are the 
abscissa and ordinate displacements and the movement of 
the bit pointer along the line. The abscissa and ordinate 
displacements of the line are normalized to the first octant 
by calculating: 


delta x = x — X, and deltay = y; — ys 


which represent the abscissa (delta x) and ordinate (delta y) 
displacements of the original line. Then, the first octant 
equivalents of A and B will be: 

A = maximum {|delta x|,|delta y|} 

B’ = minimum {\delta x|,|delta y|} 

B minimum {B’, A — B’} 
The next step in normalizing the line for the first octant is to 
assign the correct value to the movement parameters. A 
line in the first octant and starting at the origin always has 
horizontal run lengths in the positive direction along the X 
(major) axis, and has diagonal movement one unit in the 
positive X direction and one unit in the positive Y (minor) 
direction. Since the SLICE algorithm calculates the run 
lengths independent of direction, variables can easily be de- 
fined which contain the direction of movement for each slice 
and each diagonal step within the different octants. 


Lines of different angles starting at the origin have slices of 
different angles. For example, a line of angle between 22.5 
degrees and 45 degrees has run lengths that are diagonal, 
not horizontal, and the direction of the diagonal step is hori- 
zontal, not diagonal. Because of this characteristic, it is con- 
venient to break the 8 octants of the X-Y coordinate system 
into 16 sections, representing all of the partial octants. 
Then, re-number these partial octants so that they form new 
octants as in Figure 5. These redefined octants represent 
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Redefined octants for SLICE algorithm. Notice that some of the octants are 
split. The origin is at the center of the drawing. Setting DELX positive on all 
lines makes opposite octants equivalent in the table below. 


FIGURE 5 


each of the eight angle classes of lines. For example, the 
lines in octants 3 and 7 are composed of diagonal (45 de- 
gree) slices in either the positive or negative direction, and 
have diagonal step in the vertical position. Lines in octants 4 
and 8 have run length slices in the vertical direction with 
diagonal steps in the horizontal direction with respect to the 
X-Y plane. 


In conclusion, the SLICE algorithm calculates successive 
run lengths in the same manner for lines in each octant. The 
only difference between the octants is the direction of 
movement of the bit pointer after each successive run 
length is calculated. The run lengths and diagonal steps for 
each octant are given in Table |. Figure 5 shows the octants 
used by the SLICE algorithm. 


3.0 IMPLEMENTATION OF SLICE USING 
SBITS, SBITPS AND MOVMP 


The NS32CG16 features several powerful graphics instruc- 
tions. The SLICE algorithm described by this application 
note is implemented with three of these instructions: SBITS, 
SBITPS and MOVMP. The SBITS instruction allows a hori- 
zontal string of bits to be set, while the SBITPS instruction 
can set vertical or diagonal strings of bits. The MOVMP in- 
struction, not detailed in this application note, can be used 
to set long strings of bits faster than SBITS when the length 
is more than 200 bits in the horizontal direction. The 
BIGSET.S routine given in the appendix uses this instruction 
in conjunction with SBITS for long lines. These are very use- 
ful instructions for the SLICE run length algorithm, as will be 
shown in section 3.2. 


TABLE | 


OCTANT DELA DELB 


DELA-DELX +WARP + DIAG 


DELX |DELY| 1 + (+WARP) +HORZ 
DELX EWARP 


DELX DELAIDELY| DIAG 
+ 1 


DIAGONAL MOVE RUN LENGTH 


If DELX < 0 then the starting and ending coordinates are swapped. This simplifies initialization. 
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3.1 SBITS and SBITPS Tutorial 
SBITS: 


SBITS (Set BIT String) sets a string of bits along the hori- 
zontal axis of a pre-defined bit map. The instruction sets a 
string of up to 25 bits in a single execution using four argu- 
ments pre-stored in registers RO through R3. 


RO = (82 bits) Base address of bit-string destination. 

R1 = (82 bits, signed) Starting bit-offset from RO. 

R2 = (82 bits, unsigned) Run length of the line segment. 
R3 = (82 bits) Address of the string look-up table. 


The value of the bit offset is used to calculate the bit num- 
ber within the byte, assuming that the first bit is bit 0 and the 
last bit is bit 7. A maximum of 7 for the starting bit number 
added to a maximum of 25 for the run length requires a total 
of 32 bits. SBITS calculates the destination address of the 
first byte of the 32-bit double word to contain the string of 
set bits by the following: 


Destination Byte = Base Address + Offset DIV 8. 
Then, the starting bit number within the destination byte is: 
Starting Bit = Offset MOD 8. 


SBITS instruction then calculates the address for the 32-bit 
double word within the string look-up table (found in the 
NS32CG16 manual) which will be OR’ed with the 32-bit dou- 
ble word whose starting byte address is Destination Byte, as 
calculated above. The table is stored as eight contiguous 
sections, each containing 32 32-bit double words. Each of 
the eight sections corresponds to a different value of Start- 
ing Bit (Offset MOD 8), which has a possible range of 0 
through 7. The 32 double words in each section correspond 
to each value of the run length (up to 25) added to the 
starting bit offset. 


example: 
Register Contents 
before after 
RO = 1000 RO = 1000 
Ri = 235 R1 = 235 
R2 = 16 R2 = 16 
R3 = $stab R3 = $stab 


Destination Address = 1000 + (235 DIV 8) = 1029 

Starting Bit = 235 MOD 8 = 3 

Table Address = $stab + 4*(16 + (32*3)) = $stab + 448 
bytes 

32-bit Mask = 0x0007FFF8 

This mask value is OR’ed with the 32-bit double word start- 

ing at byte address 1029 decimal. Notice that the mask 

OxO007FFF8 leaves the first 3 bits and the last 13 bits 

alone. Thus, a string of 16 bits is set starting at bit number 3 

at address 1029 decimal. The contents of the registers are 

unaffected by the execution of the SBITS instruction. 


Since the SBITS instruction can set up to 25 bits in one 
execution, the run length in R2 can be compared to 25, and 
a special subroutine executed if it exceeds 25 bits. The sub- 
routine will set the first 25 bits, then subtract 25 from the run 
length, and compare this to 25 again. This process is re- 
peated until the run length is less than 25, in which case 
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the remaining bits are set and the subroutine returns. The 
DRAW__LINE algorithm implemented in this application 
note uses this method for strings of bits to be set less than 
200. For horizontal lines greater than 200 pixels in length, 
the BIGSET routine is more efficient, as described below. 


BIGSET: 


The utility program BIGSET.S is used to draw longer lines, 
more than 200 pixels in length, more efficiently than SBITS. 
BIGSET.S, which is given in the appendix, uses the MOVMP 
instruction (MOVe Multiple Pattern) to set long strings of 
bits. Since MOVMP operates on double-word aligned ad- 
dresses most efficiently, the string is broken up into a start- 
ing string within the first byte, a series of bytes to be set, and 
an ending string which is the leftover bits to be set within the 
final byte. The starting and ending strings of bits, if any, are 
set using the SBITS table with an OR instruction. 


SBITPS: 


SBITPS (Set BIT Perpendicular String) handles both vertical 
lines and diagonal lines. This instruction also requires four 
arguments pre-stored in RO through R3. RO, R1 and R2 are 
the Base Address, Starting Bit Offset and Run Length re- 
spectively, as for SBITS. R3, however, contains the destina- 
tion warp. 

Note: The Destination warp is the number of bits along the horizontal length 
of the bit map, or the number of bits between scan lines. It is also 
referred to as the “pitch” of the bit map. Thus, a vertical one-unit 
move in the positive direction would require adding the value of the 
warp to the bit pointer. A diagonal or 45 degree line is drawn when the 
warp is incremented or decremented by one. 


The run length is a 32 bit unsigned magnitude. 
example: 
(Assume that the bit map is a 904 x 904 pixel grid.) 


Register Contents 


before after 

RO = 1000 RO = 1000 

R1 = 235 R1 = 235 + (150*904) = 135,835 
R2 = 150 R2=0 

R3 = +904 R3 = +904 


Destination Address = 1029 
Starting Bit Number = 3 | 
Run Length = 150 

Warp = +904 


As in the example for SBITS, the Destination Address is 
1029, with Starting Bit Number = 3. Since the warp in this 
example is +904 and the bit map is 904 x 904 bits, the line 
is vertical, has a length of 150 pixels and starts at bit num- 
ber 3 within the byte whose address is 1029 decimal. Unlike 
the SBITS instruction, the SBITPS alters registers R1 and 
R2 during execution. R71 is set to the position of the last bit 
set plus the warp. However, this is convenient for drawing 
the next slice since R1 has been automatically updated to 
its proper horizontal position for setting the next bit. The bit 
offset in R1 need only be incremented by + 1 or —1 to point 
to the exact position of the next bit to be set. 


Diagonal lines are drawn when the value contained in R3 is 
an increment of the bit map’s warp. 


example: 
(Assume that the bit map is a 904 x 904 pixel grid.) 


Register Contents 


before after 

RO = 1000 RO = 1000 

R1 = 235 R1 = 235 + (150*905) = 135,985 
R2 = 150 R2=0 

R3 = +905 R3 = +905 


This example draws a diagonal line with positive slope start- 
ing at bit position 3 in byte 1029. Notice that the new value 
of R1 = 135,985 is exactly 150 pixels offset from the value 
of R1 in the vertical line drawn in the previous example. 
Adding +1 to the warp in this example caused the bit posi- 
tion to move not only in the positive vertical direction, but 
also in the positive horizontal direction, forming a diagonal 
line. 


3.2 implementation of DRAW_LINE and SLICE on the 
NS32CG16 


Both a C version of the DRAW__LINE algorithm and an 
NS32CG16 assembly version are given in the appendix. The 
C program was implemented on SYS32/20 which uses the 
NS32032 processor. An emulation package developed by 
the Electronic Imaging Group at National was used to emu- 
late the SBITS and SBITPS instructions in C, and also the 
MOVMP instruction used for lines longer than 200 pixels. 
The emulation routines, which cover all NS32CG16 instruc- 
tions not available on other Series 32000 processors, are 
available as both C functions and Series 32000 assembly 
subroutines. 


The DRAW__LINE program was first written in C using the 
emulation functions. Once this version was tested and func- 
tional, it was translated into Series 32000 code and further 
optimized for speed. The assembly version uses the Series 
32000 assembly subroutines which emulate the SBITS and 
SBITPS instructions. NS32CG16 executable code was de- 
veloped by replacing the emulation subroutine calls with the 
actual NS32CG16 instruction. The functional and optimized 
code was finally executed on the NS32CG16 processor with 
the aid of the DBG16 debugger for downloading the code to 
an NS32CG16 evaluation board. Timing for lines of various 
slopes is given in the Timing Appendix. 

Most of the optimization efforts are concentrated in the 
main loop of the SLICE algorithm. Since the use of SBITS or 
SBITPS for the run length depends on the slope of the line, 
the code is unrolled for the different octants. This minimizes 
branching within the main loop, and cuts down on overall 
execution time. Also, the DRAW__LINE takes advantage of 
the NS32CG16’s ability to draw fast horizontal, vertical and 
diagonal lines by separating these lines out from the actual 
Bresenham SLICE algorithm. Therefore, time is not wasted 
for trivial lines on executing the initialization sections and 
main loop sections of the SLICE algorithm. 


Branching within the initialization section is also minimized 
by unrolling the code for each octant. Recall from section 
2.2 that in order to extend the algorithm over all octants, the 
abscissa and ordinate displacements must be normalized to 
the first octant and the run length directions must be modi- 
fied to preserve the slope of the line. Partitioning the pro- 
gram into ‘‘octant” modules makes the initialization for each 
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octant less cluttered with compare-and-branches. Table | 
shows that each octant has a unique value for DELA and 
DELB (the normalized abscissa and ordinate displace- 
ments). Note that at the beginning of the programs, DELX or 
Xf — Xs is checked for sign, and if negative, the absolute 
value function is performed and the starting and ending 
points are exchanged. This is done because each octant 
module of the SLICE algorithm only cares about the sign of 
DELY with respect to coordinate (Xs,ys). DELX is only impor- 
tant when initializing DELA or DELB, and in this case, only 
the absolute value is needed. 


4.0 SYSTEM SET-UP 


NS32CG16 Evaluation Board: 


—NS32CG16 with a 30 MHz Clock 

—256KB Static RAM Memory (No Wait States) 
—2 Serial ports 

—MONCG16 Monitor 

Host System: 

—SYS32/20 running Unix System V 

—DBG16 Debugger 


Software for Benchmarking: 


—START.C Starts timer and calls DRIVER. 

—DRIVER.C Feeds vectors to DRAW__LINE. 

—DRAW__LINE.S Line drawing routine which includes 
SLICE. 

—BIGSET.S Uses MOVMPi to set longer lines. 
Called by DRAW__LINE if length > 
200. 

4.1 Timing 


Timing Assumptions: 
1. No wait states are used in the memory. 
2. No screen refresh is performed. 


3. The overhead referred to as the “driver’’ overhead is the 
time it takes to create the endpoints for each vector. This 
is application dependent, and is not included in the 
Vector/Sec and Pixel/Sec times. 


4. The overhead referred to as the “line drawing’ overhead 
is the time it takes to set up the registers for the actual 
line drawing routine. This overhead comes from the 
DRAW__LINE program only and is included in all times. 


5. Raw data given in the Timing Appendix for the SBITS, 
SBITPS and MOVMP is the peak performance for these 
instructions. These times do not include line drawing 
overhead or driver overhead. 


The timing for this line-drawing application was done so as 
to give meaningful results for a real graphics application and 
to allow the reader to calculate additional times if desired. 
The routines are not optimized for any particular application. 
All line drawing overhead, such as set-up and branching, is 
included in the given times for Timing Table A, B and C. The 
23 ws driver overhead of the calling routines is not included 
in the given times for vectors per second and pixels per 
second. Calculation of these values was done by subtract- 
ing the 23 ys out of the average time per vector so that the 
given times are only for the processing of the vectors. They 
do not include the overhead of DRIVER.C and START.C 
(refer to these programs in the appendix). 


In addition, the DRAW__LINE algorithm is timed for several 
test vectors at various strategic points in the code so that 
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the reader may verify set-up times or calculate other rele- 
vant times. The program DRAW__LINE.S in the appendix 
contains markers (e.g., T1, T2...) for each point at which a 
particular time was taken. The program was run using a 
driver program (DRIVER.C in the appendix) which consists 


of several loops which pass test vectors to the 
DRAW__LINE routine. A “return” instruction was placed at 
the time marker so that the execution time was only mea- 
sured up to that marker. These times are given in the Timing 
Appendix Table E and include total execution time up to 


each of the markers. 


A millisecond interrupt timer on the NS32CG16 evaluation 
board was used to time the execution. For each execution, 
the DRIVER program executed its inner loop over 100 
times, and sometimes over 1000 times, so that an accurate 
reading was obtained from the millisecond timer. The final 
times were divided by this loop count to obtain a “bench- 
mark” time. This benchmark time was divided by the total 
number of lines drawn to obtain an average time per vector. 
The overhead of START.C and DRIVER.C in calling the 
DRAW__LINE.S routine was not counted in the average 
time per vector or the average time per pixel calculation. 
Table E of the Timing Appendix gives the timing for each of 
the markers and the conditions under which these times 
were taken. | 


Bresenham’s SLICE Algorithm: | 
1. INITIALIZE PARAMETERS, MAKE NECESSARY ROTATIONS 


5.0 CONCLUSION 


The timing for the DRAW__LINE algorithm is a good indica- 
tion of the performance of the NS32CG16 in a real applica- 
tion, something which the datasheet specifications can’t al- 
ways show. The timing clearly shows that the NS32CG16 is 
well-suited for line-drawing applications. Using the SBITS, 
achieved for lines of all slopes and lengths. The NS32CG16 
is an ideal processor for taking advantage of the much fast- 
er SLICE algorithm. | 


The SLICE algorithm, which calculates run lengths of line 
segments to form a complete rasterized line, is much faster 
than its Bresenham predecessor which calculates the line 
pixel by pixel. The SLICE algorithm always executes the 
main loop at least twice as fast as the original Bresenham 
algorithm, which executes its main loop exactly 
max{|delx|,|dely|} times for each line. 


REFERENCES 


JE. Bresenham, |BM, Research Triangle Park, USA. “Run 
Length Slice Algorithm for Incremental Lines”, Fundamen- 
tal Algorithms for Computer Graphics, Springer-Verlag 
Berlin Heidelberg 1985. 


N.M. Cossitt, National Semiconductor, “Bresenham’s Line 
Algorithm Using the SBIT Instruction’, Series 32000 
Graphics Note 5, AN-524, 1988. 


National Semiconductor, NS32CG16 Supplement to the 
Series 32000 Programmer’s Reference Manual, 1988. 


2. OUTPUT INITIAL RUN LENGTH (Ho) IN PROPER OCTANT DIRECTION 
MOVE DIAGONALLY IN APPROPRIATE DIRECTION TO START OF NEXT RUN LENGTH 


3. OUTPUT INTERMEDIATE RUN LENGTHS 
COUNT = COUNT — 1 
IF COUNT < 0 GOTO 4. 


IF TESTVAR < 0 H=Q-—1 AND TESTVAR=TESTVAR+ 2*R 
ELSE H=Q AND TESTVAR= TESTVAR + 2*R—2*DELB 
OUTPUT RUN LENGTH OF LENGTH H IN PROPER DIRECTION 
MOVE DIAGONALLY IN PROPER DIRECTION 
GOTO 3. | 

4, OUTPUT FINAL RUN LENGTH OF LENGTH He 

5. END 


INITIALIZED PARAMETERS 

DELA = MAXIMUM OF {|DELX|,|DELY]} 

DELB = MINIMUM OF {|DELA|,DELA-MINIMUM{|DELX|,|DELY|}} 
Q = LOWERIDELA/DELB] | 


R = DELA—DELB*Q 

M = LOWER[Q/2] 

N = R (IF Q EVEN) 

N = R+DELB (IF Q ODD) 


Ho = M (IF DELY>0 OR N<>0) 
Ho= M—1 (IF DELY<0 AND N=0) 
He = M (IF DELY<0 OR N<>0) 

He = M—1 (IF DELY >0 AND N=0) 
COUNT = DELB 


TESTVARg = N+ 2*R-—2*DELB (IF DELY2=0) 
TESTVARg = N+2*R-2*DELB—1 (IF DELY <0) 
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Graphics Image (2000 x 2000 Pixels), 300 DPI 


FIGURE 6. Star-Burst Benchmark 
This Star-Burst image was done on a 2k x 2k pixel bit map. Each line is 
2k pixels in length and passes through the center of the image, bisecting the square. The lines are 
25 pixel units apart, and are drawn using the DRAW__LINE:S routine. There are a total of 160 lines. 
The total time for drawing this Star-Burst is 1.0s on 15 MHz NS32CG16. 
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TIMING APPENDIX 

A. PEAK RAW PERFORMANCE AT 15 MHz 
Function Rate* 
Horizontal Line (SBITS) 9 MBits/s 
Horizontal Line (MOVMP) 60 MBits/s 
Vertical Line (SBITPS) 440 kBits/s 


*Raw performance does not include any register set-up, branching or other software set-up overhead. 


B. TRIVIAL LINES (Using 1k x 1k Bit Map Grid) 


ae Pixels/Line | Vectors/Sec | Pixels/Sec | Comments** 


Horizontal: 1000 13,361 13,361,838 | Uses BIGSET.S with MOVMP. 
100 
10 


24,136 2,413,593 | Uses SBITS only. 
45,687 456,870 | Uses SBITS only. 


Vertical and 1000 424 424,000 Uses SBITPS. 
Diagonal: 100 3,975 397,460 : 
10 24,491 244,910 


**Pixels/Sec and Vectors/Sec are measured from start of DRAW__LINE.S only. The 23.128 ws driver overhead was not included in these measurements. 


C. ALL LINES (Using the “Star-Burst” Benchmark and the SLICE Algorithm) 


Pix/Vector Vectors/Sec Pixels/Sec Total Time* | Comments** —_| 


1000 318 318,165 250 Lines in Star-Burst 
100 2,811 281,118 50 Lines in Star-Burst 
10 14,549 145,490 10 Lines in Star-Burst 


Avg. Set-up Time Per Line (Measured from Start of DRAW__LINE Only): 37 ps 


D. ALL LINES (Using Original BRESENHAM Iterative Method with SBIT and the Star-Burst Benchmark) 


Pix/Vector | Vectors/Sec | Pixels/Sec | Total Time’ | Comments** | 


1000 163 162,746 250 Lines in Star-Burst 
100 1,568 158,332 50 Lines in Star-Burst 
10 11,547 127,021 10 Lines in Star-Burst 


Avg. Set-up Time Per Line (Measured for Line Drawing Routine Only): 30 ps 


The Bresenham program used for the above table can be found in the Series 32000® Graphics Application Note 5. 


*Total time is measured from start of execution to finish. It includes all line drawing pre-processing, set-up and branching, and it includes all driver overhead of 
DRIVER.C and START.C. This time is a good indication of the pages per minute for the complete Star-Burst benchmark. Vectors/Sec and Pixels/Sec are 
measured from start of DRAW__LINE.S only. The 23.712 us overhead was not included in these measurements. 


**Star-Burst benchmark draws an equal number of lines in each octant. DRIVER.C creates vectors that form the Star-Burst image, passing these vectors to 
DRAW__LINE.S as they are created. The bit map image can then be downloaded to a printer for a hard copy, as in Figure 6. 
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Measurement 
Point 


T2 
T3 


T4 


T5 
T6 
T7 


T8 


T9 


T10 


T11 


T12 


T13 
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T15 


T16 


T17 


Ti8 


TIMING APPENDIX TABLE E 


Octant of Test Vector 
(Refer to Figure 5) 
And Length of Vector 


Any Non-Calculated | Any Octant, Any Length 


23.712 STAR-BURST 
40.056 (0,0,0,999) Vertical, 1000 Pixels/Vector 
41.780 (0,999,0,0) Vertical, 1000 Pixels/Vector 
40.884 0,0,999,0) Horizontal, 1000 Pix/Vect 


Test Vector 
Used 


Measured 
Time/Vector* 


All Octants, 1000 Pixels 


(0, 
43.912 (999,0,0,0) Same 
( 


44.532 0,0,999,999) Diagonal, 1000 Pix/Vect 
(0,0,999, 10) Octant 1 1000 Pix/Vect 


(0,0,999, 10) Octant 1 1000 Pix/Vect 
(0,0,99, 10) 100 Pix/Vect 
(0,0,9,2) 10 Pix/Vect 


(0,0,999, 10) Octant 1 1000 Pix/Vect 
(0,0,99,10) 100 Pix/Vect 
(0,0,9,2) 10 Pix/Vect 


Octant 2 1000 Pix/Vect 
100 Pix/Vect 
10 Pix/Vect 


Octant 2 1000 Pix/Vect 
100 Pix/Vect 
10 Pix/Vect 


(0,0,999,800) 
(0,0,99,80) 
(0,0,9,8) 


Octant 3 1000 Pix/Vect 
100 Pix/Vect 
10 Pix/Vect 


Octant 3 1000 Pix/Vect 
100 Pix/Vect 
10 Pix/Vect 


(500,0, 700,999) 
(50,0,70,99) 


(500,0,700,999) 
(50,0,70,99) 
(5,0,7,9) 


Octant 4 1000 Pix/Vect 
100 Pix/Vect 
10 Pix/Vect 


Octant 4 1000 Pix/Vect 
100 Pix/Vect 
10 Pix/Vect 


(10,0,999,999) 
(10,0,90,99) 
(2,0,8,9) 


(10,0,999,999) 
(10,0,90,99) 
(2,0,8,9) 


Comments 


Overhead of entry into DRAW__LINE when 
not calculating endpoints of line. Application 
dependent. 


Overhead of entry into DRAW__LINE when 
calculating the STAR-BURST vectors. 
Application dependent. 


Average overhead per vertical line 
to start of line draw instruction (SBITPS). 


Average overhead per vertical line with 
negative slope to start of line draw instruction. 


Average overhead per horizontal line to start — 
of line draw instruction. (SBITS and BIGSET). 


Same as above with negative delta < value. 


Average overhead per diagonal line to start 
of line draw instruction (SBITPS). 


Same as above for diagonal line with 
negative delta < value. 


Average overhead per line to first run length 
slice of the SLICE algorithm for octant 1. 


Average overhead per 1000, 100 and 10 pixel 
line through first run length of the SLICE 
algorithm. Dependent on the vector length. 


Average overhead per 1000, 100 and 10 pixel 
line to start of main loop of SLICE algorithm. 
Dependent on the vector length. 


Average overhead per line to first run length. 
Not dependent on vector length. 


Average overhead per 1000, 100 and 10 pixel line 
through first run length of the SLICE algorithm. 
Dependent on the vector length. 


Average overhead per 1000, 100 and 10 pixel 
line to start of main loop of SLICE algorithm. 
Dependent on the vector length. 


Average overhead per line to first run length. Not 
dependent on the vector length. 


Average overhead per 1000, 100 and 10 pixel line 
through first run length of the SLICE algorithm. 
Dependent on the vector length. 


Average overhead per 1000, 100 and 10 pixel line 
to start of main loop of SLICE algorithm. 
Dependent on the vector length. 


Average overhead per line to first run length. Not 
dependent on the vector length. 


Average overhead per 1000, 100 and 10 pixel line 


| through first run length of the SLICE algorithm. 


Dependent on the vector length. 


Average overhead per 1000, 100 and 10 pixel line 
to start of main loop of SLICE algorithm. 
Dependent on the vector length. 


*Each time was measured from start of benchmark execution to the Tx marker in the DRAW__LINE.S program. Thus, the overhead of the calling routine to the 
DRAW__LINE routine is T1 = 23.712 ws for the STAR-BURST benchmark. All programs used for timing are included in the Appendix. All times given above are for a 


1k x 1k bit map. 
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/* This program draws a line in a defined bit map using Bresenham's */ 
/* SLICE algorithm. */ 


#include<stdio.h> 

#define xbytes 259 

#define warp 2699 

#define maxy 1999 

unsigned char  bit_map[xbytes*maxy] ; 
extern unsigned char sbitstab[]; 


draw_line(xs,ys,xt,yt) 
int xs,ys,xt,yt; 


{ 
int bit,i,j,delx,dely,dela,delb, 
hf, h hp, testvar,q,r,m, 
n, count ,xinc,yinc; 


delx=xt-xs; 
dely=yt-ys; 


if (xt-xs<P) { 
xs=xt; 
ys=yt; 
delx=abs (delx) ; 
dely= -dely; 


bit=xstys*warp; 
if (delx==9) { 
if (dely>=9) ( 
sbitps(bit_map,bit,dely,warp) ; 
return; 


) 

else { 
sbitps(bit_map,bit,abs(dely) ,-warp) ; 
return; 


) 


} 

if (dely==9) { 
sbits(bit_map,bit,delx,sbitstab) ; 
return; 


} 
if (abs(delx)= padelysani 
if (delx*dely>=9) { 
sbitps(bit_map,bit,abs(dely) ,warp+1) ; 
return; 


} 

else { 
sbitps(bit_map,bit,delx,-warp+1) ; 
return; 

} 


if (aba (delx) >abe (dely) ) { 
rig (abs (dely) <(delx-abs (dely) ) ) 


dela=delx; 
delb=abs (dely) ; 
xinc=1; 
if (dely>=9) 
yinc=warp; 
else 
yinc= -warp; 


q=dela/delb; 
TL/EE/9663-7 
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else{ 


=dela-delb*q; 


m=q/2; 

1f (q-2*(q/2)==9) 
n=r}; 

else 
n=r+delb; 

if ((dely>=) || (n!=9)) 
hg=m; 

else 
hg=m-1; 

if ((dely<f) || (n!=9) ) 
hf=m; 

else 
hf=m-1; 


count=delb; 


if (dely>=9) 

testvar=nt+2*r-2*delb; 
else 

testvar=nt+2*r-2*delb-1; 
sbits(bit_map,bit,hf+1,sbitstab) ; 
bit=bit+hg#t+tyinc+xinc; 


for(i=count-1;1i>f;i--) { 
if (testvar<®) { 
h=q-1; 
testvart=2*r; 


else { ; 
h=q; 
testvar+=2*r-2*delb; 


} 
sbits(bit_map,bit,h+1,sbitstab) ; 
bit=bit+h+yinc+xinc; 


} 
sbits(bit_map,bit,hf,sbitstab) ; 
return; 


dela=abs (delx) ; 
delb=dela-abs(dely) ; 
xinc=1; 
if (dely>=6) 
yinc=warp; 
else 
yinc= -warp; 
q=dela/delb; 
r=dela-delb*q; 


m=a/2 

1f (q-2*(q/2)==$) 
n=r; 

else 

; n=r+delb; 

1f ((dely>=9) | | (n!=9)) 
hg=m; 

else 
hg=mn-1; 

if ((dely<®) | | (n!=9)) 
hf=m; 

else 
hf=m-1; 
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else{ 


count=delb; 


if (dely>=9) 
testvar=n+2*r-2%*delb; 
else 
testvar=nt2*r-2%*delb-1; 
sbitps (bit_ map,bit, h§+1,yinc+1) ; 
bit=bit+hgthg*yinc+1; 
for (i=count-1;i>f;i--) { 
if (testvar<p) (- 
h= =q~-1; 
testvart=2*r; 
} 
else { 
h=q; 
testvar+=2*r-2%delb; 


} 
sbitps(bit_ map, bit,h+1,yinct1) ; 
bit=bit+h+tyinc#ht1; 


) 
sbitps(bit_map,bit,hf+1,yinc+1) ; 
return; 


if (abs (delx) <(abs(dely) ~abs(delx) ) ) { 
dela=abs(dely) ; 
delb=abs (de1x) ; 
1f(dely>9g) 
xinc=warp; 
else 
xinc= -warp; 


q=dela/delb; 
r=dela-delb*q; 


m=9/2; 
1f (q-2*(q/2) ==) 


n=r; 
else 
n=r+delb:; ’ 
if a nd =) | | (nt=g)) 
else 
hg=m-1; 
if dely< = 
(( yee Ltn $)) 
else 
hf=m-1; 


count=delb; 


if (dely>=9) 
testvar=n+2*r-24delb; 
else 
testvar=nt2*r-2*delb-1; 
sbitps(bit_ map, bit,h#+1,xinc) ; 
bit=bittyinc+ (1+hg} *xinc; 
for(i=count-1;1i>f;:i--) { 


if (testvar<f) { 
~~ -~Le: 


=q ; 


testvar+=2*r; 


) 
else { 
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} 
else{ 


h=q; 
testvart=2%*r-2*delb; 


} 
sbitps(bit_map,bit,h+1,xinc) ; 
bit=bit+yinc+xinc*(1i+h) ; 


} 
sbitps(bit_map,bit,hf+1,xinc) ; 
return; 


dela=abs (dely) ; 
delb=dela-abs (delx) ; 
eee 
f (dely>@) 
xinc=warp; 
else 
xinc= -warp; 


q=dela/delb; 
r=dela-delb*q; 
m=q/2; 
if (q-2*(q/2)==9) 
n=r; 
else 
n=r+delb; 
if ((dely>=9) | | (n!=)) 
hg=m; 
else 
hg=m-1; 


if ((dely<f) || (n!=9)) 
hf=m; 


else 
hf=m-1; 
count=delb; 


if (dely>=9) 
testvar=n+2*r-2%*delb; 
else 
testvar=n+2*r-2%*delb-1; 
sbitps(bit_map,bit,h$+1,xinct+t1) ; 
bit=bit+h@+(1+h@) *xinc; 
for (i=count-1;i>9;i--) ( 


if (testvar<Q) { 
=q-1; 
testvar+=2*r; 
else { 
h=q; 
testvar+=2%r-2%*delb; 


} 
sbitps(bit_map,bit,h+1,xinct1l) ; 
bit=bit+h+xinc*(1+h) ; 


) 
sbitps(bit_map,bit,hf,xinc+1) ; 
return; 
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_draw_line: 


# T1 


- VERT: 


# T2 


- VNEG: 


-HORZ: 


# T4 


alpl: 


National Semiconductor Corporation. 
CTP version 2.4 ~~ draw_line.s -- 
compilation options: -O -S =-KC332 -KF#81 -KB4 


"draw_line.s" 
-comm bit map, 499759 


-set 


-align 4 


enter 


movd 
mova 
subd 
mova 
movd 
subd 
cmpqd 
ble 
movd 
mova 
absd 
negd 


WARP , 2899 
_draw_line 
_sbitstab 


[r3,r4,r5,r6,r7], 


16(fp) ,r4 # 
8(fp),¥r5 
r5,r4 
26(fp) ,r6 
12(fp) ,r7 
r7,xr6 
$(P) ,x4 

- VERT 
16(fp) ,r5 
20(fp),xr7 
r4,r4 
r6,r6 


$WARP,r1 


# 

# 

# 

# 

# 

# 

# 

# 

# 

# 
r7,Y1 # 
# 

# 

# 

# 

# 
_bit_map,rf # 
r6,rxr2 # 
SWARP,r3 # 
# 

] 


a a a 
(9) 


_bit_map,rg # 
r6,r2 # 
r2,r2 # 
$(-WARP) ,¥r3 # 
# 
] 


(ea (24; Fes EGE? 
(2) 


$(2) ,r6 
. DIAG 


_bit_map, rf 
r4,r2 
_sbitstab,r3 


ok 
$299, x2 
bigsl 
25,r2 

4 


SES SHES OH 


r2,r1 


12 


xf=new xs 
yf=new ys 
delx=|delx| 
dely=(-dely) 


ys 
ys*warp 
bit=ys*WARP+xs 
delx=9? 


dely>9? 

if no then warp is neg 
set registers for sbitps 
r2=dely=length of line 
r3=warp 


draw line 


set reg's for sbitps 
r2=(-dely) 
r2=dely=length of line 
r3=warp 


draw line 


dely=$? 

set reg's for sbits 
r4=delx=length 
table pointer 


try sbits 
if not more than 25, skip it 
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bigs1: 
ok: 


-DIAG: 


# TS 


- DNEG: 


# T6 


ret 
ealign 


addr 
movda 
mova 


sbitps 
exit 
ret 


-align 4 


-SLOPELT1: 


- NEGWARP: 


-INIT1: 


-INIT2: 


r2,r4 
r2,r4 
alpi 
4 
r4,r2 


(23,54, 25,26,27) 
9) 


Placae 
ao 
(9) 


r6,r5 # 
r5,r4 # 
-SLOPELT1 
$($) , x6 # 
- DNEG 
_bit_map,rg # 
r4,r2 # 
SWARP + 1,x3 # 
# 
] 


ia ah a 
(9) 


_bit_map,rg # 
r4,r2 # 
S-WARP + 1,r3 # 
# 
] 


r3,r4,r5,r6,r7 


(2) 


r5,r4 
-SLOPEGT1 
r4,r2 
r5,r2 
~OCTANT2 
$(9) ,r6 
» NEGWARP 
WARP, -4(fp) 
IT1 


Se HEHEHE HEHE OHH 


-WARP, -4 (fp) 


r4,r3 
r5,xr3 
r3,rp 
$-1,r9 
r3,r2 
r5,r2 
r2,r4 
r4,YX2 
$f,x3 
-INIT2 
r5,r2 


aoe 8h: =H SH [He He HE SHH ESE HE 


r2,x7 
r3,tos 
rg,xr2 


St He He: 


r5=|dely| 
|dely|=delx? 


dely>$? 


set reg's for sbitps 
r2=delx=length 
r3=warp+1 for diag 


draw line 


set reg's for sbitps 
r2=delx=lenght 
r3=warp-1 for neg slope 


draw line 


Slope less than 1 
|dely|>delx? 


r2=delx 

delx-|dely| 

|dely{>delx-|dely |? 

1f no, start octantl else octant2 
dely>9? 


pos slope then warp=positive 


warp=negative for neg slope 
calculate parameters 
delx=dela |dely|=delb 
dela/delb=q 

calc m 

m=q/2 

calc r 

delb*q 

r=dela-delb*gq 

set r2 =r 

is r3 odd? 

yes, n=r 

n=r+delb 


pop n 
push q on stack 
r2=m=h@ 
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- 2DRAW25: 
subd 
sbits 


-MAINLOOP: 

# TS 
cmpqda 
ble 
addqd 
addd 
sbits 
bfc 
cmpd 
blt 
movd 


rg,-8(fp) 
$() ,x7 
-INIT3 
$() ,r6 
~INIT4 
$-1,r2 
-INIT3 


$1,-8(fp) 


$1,r2 
_bit_map,rg 
_Sbitstab,r3 


- 2DONE 
$202 ,xr2 
BIGSET1 
r5,tos 
r2,r5 
$25,r2 


r2,r5 


r2,r1 
r2,r5 

» Z2DRAW25 
r5,r2 
tos,r5 


- 2DONE 
bigset 


r2,r2 
~4(fp),rl 
r4,r4 
r5,rxr3 
r5,55 
r5,xr7 
$(G) , x6 
~INITS 
$-1,r7 


tos,r2 

$1,r2 

r3,tos 
_sbitstab,r3 
_bit_map,rp 
~4(fp) ,r6 
$-1,tos 
$8,2(sp) 

- LASTRUN 


$(f) ,x7 
- CASE2 
$-1,r2 
r4,rxr7 


- 3DRAWLAST 
$292 ,xr2 
BIGSET3 
r2,tos 


ts SE EAE 


$e 34E4HE te 


Se HEHEHE HEHE EHH HEHEHE HE SE SE SE HE 


mem=m=hpartb 

n=? 

dely>$? 

hg=m-1 

hpartb=m-1 

takes care of dashes 
set reg's for sbits 
h#=r2 bit=r1 


set bits if less than 25 


bit=bit+h#+1 


bit=bit+h#+1l+warp 


2*r 

save delb 
delb*2 
n=n+2*r 


testvar=n+2*r+delb*2 


dely>g 


testvar-1 


r2=q=h=run length 
smoothes out line 


push delb=count 
set reg's for sbits 


warp 
count=count-1 
count=9? 


Bresenham slice algorithm 


testvar>9? 


h=q-1 


testvar=testvart+2*r 


set bits if less than 25 
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movd r5,tos 
movd r2,r5 
movd $25,r2 
- 3DRAW25: 
subd r2,r5 
sbits 
addd r2,r1 
cmpd ¥2,75 
blt - 3DRAW25 
mova r5,r2 
sbits 
addd r2,r1 
mova tos,r5 
mova tos,r2 
br - 3DONE 
BIGSET3;: 
bsr bigset 
- 3DRAWLAST: 
addd r2,r1 # update bit 
. 3DONE: 
addd r6,r1 # bit=bit+warpth+1 
addd $1,xr2 # exit h 
addqd $(-1) ,tos # count=count-1 
cmpqd $(9) ,8(sp) # count=$? 
b1t - eMAINLOOP 
-align 4 
. LASTRUN: 
cmpqd $(9) , tos # pop stack 
movd ~8(fp) ,r2 # hpartb=last run length 
sbits 
bfc - 4DONE # set bits if less than 25 
cmpd $269,r2 
blt BIGSET4 
movda r2,tos 
mova r5,tos 
movd r2,r5 
movd $25,r2 
- 4DRAW25: 
subd r2,r5 
sbits 
addd r2,r1 
cmpd r2,r5 
b1t - 4DRAW25 
movd r5,r2 
sbits 
addd r2,r1 
mova tos,r5 
mova tos,r2 
br - 4DONE 
BIGSET4 : 
bsr bigset 
- 4DONE: 
exit r3,r4,r5,r6,1r7] 
ret (P) 
-align 4 
- CASE2: 
addd r4,r7 # testvar=testvart+2%sr 
subd r5,xr7 # testvar=testvar+2*r-2*delb 
sbits 
bfc - 5DRAWLAST # SET BITS IF LESS THAN 25 
cmpda $290,r2 
bit BIGSETS5 
movd r2,tos 
movd r5,tos 
mova r2,r5 
mova $25,rxr2 
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- 5DRAW25: 


subd 
sbits 
addd 
cmpd 
bit 
movd 
sbits 
addd 
movda 
movd 


br 
BIGSETS: 


bsr 


- SDRAWLAST : 


- 5DONE: 


- 6DONE: 


addd 


addd 
addqd 
cmpqd 
blt 
cmpqd 
movd 
sbits 
bfc 
bsr 


br 
- 2NEGWARP: 


-2INITI1: 


addr 


br 
-2INIT4: 


-2INIT3: 


$ (9) , tos 


~8(fp) ,r2 


- 6DONE 
bigset 


Se SESH SEES OE 


[eo eerones7e 7) 
(9) 


$(f) , x6 


. 2NEGWARP 
WARP, -4 (fp) 


-2INIT1 


“WARP, -4 (fp) 


$1,-8(fp) 


4 Fe SESE EAH 


update bit 


bit=bit+warpth+1 
update count 
count=$? 


pop stack 
hpartb=last run length 


set bits if less than 25 


draw line in octant 2 
dely>?g? 


pos slope then warp=positive 


warp=negative for neg slope 
calculate parameters 
dela=delx 
delb=delx-|dely| 
dela/delb=q 

calc m 

=q/2 

calc r 

delb*q 

r=dela-delb*q 
push r on stack 


then n=r 
n=r+delb 


pop n 
push q on stack 
r2=m=h9 


set one extra bit for smoothness 


mem=m=hpartb 
n=? 


dely>g? 
hg=m~-1 


hpartb=m-1 
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# T19 
# T11 


-2INITS: 


- 2MAINLOOP: 

# T12 
cmpqd 
ble 
subd 
addd 
movd 
sbitps 
movd 
addqd 
subd 
adda 
subd 
cmpqd 
bit 
-align 4 

-2LASTRUN: 
cmpqd 
movda 
sbitps 
exit 
ret 
-align 4 

- 2CASE2: 
addd 
subd 
mova 
sbitps 
mova 
addqd 
subd 
subd 
cmpqd 
b1t 
cmpqd 
movd 
sbitps 
exit 
ret 
ealign 4 

-SLOPEGT1: 
mova 
subd 
cmpd 


_bit_map,rp 
34 Cf). x3 
$1,7r3 


$1,r1 
r3,r1 
r4,r4 
tos, r2 
$1, r2 
r5,tos 
r5,r5 
r4,xr7 
r5,xr7 
$(B) ,r6 
-2INITS 
$1,r7 


$1,tos 


$9, (sp) 
- 2LASTRUN 


$() , x7 
- 2CASE2 


$1,r2 
r2,tos 


tos,r2 
$1,r1 

P34 r1 

$1, r2 
$1,tos 
$2,9(sp) 

- 2MAINLOOP 


$(#) , tos 
~8(fp) ,r2 


(r3,r4,r5,r6,r7 
$ (9) 


r4,r7 
r5,r7 
r2,tos 


tos,r2 
$1,r1 
r3,r1 
$(1),tos 
$2, (sp) 

- 2MAINLOOP 
$(9) ,tos 
-8(fp) ,r2 


eran 
(9) 


r5,r2 
r4,r2 
r4,r2 


TE SESE SESE SESE SE SESE HE EE HEHE HE EE SEE SE HEE SE OE OH HEHE 


# 
# 
# 
] 
# 
# 
# 
# 
# 
# 
# 
# U 
# 
# 
# 
# 
J 


# 
# 
# 
# 


set reg's for sbits 


warp=r3 h@=r2 bit=r1 


octant 2 needs diag runs 


draw first run length 


update bit in x direction 


sbitps adds extra warp 


2*r 


q=h=next run length 
set extra bit for smoothness 
push delb=count 


delb*2 
n=n+2*r 


testvar=n+2*r+delb*2 


dely># 


testvar-1 


count=count-1 


count=9? 


Bresenham slice algorithm 


testvar>#? 


h=q-1 


testvar=testvar+2*r 


preserve h 


draw diag line of length h 


renew h 


update bit in x direction 
sbitps adds one warp extra 
exit h to gq 

count=count-1 


count=98? 


pop stack 


hpartb=last run length 
all other reg's set up 


testvar=testvart+2*r 
testvar=testvar+2*r-2*delb 


preserve h 


draw line of length h=q 


renew h 


update bit in x direction 
sbitps adds one warp extra 
update count 


count=$? 


pop stack 


hpartb=last run length 
all other reg's set up 


coordinates are rotated for these lines 


r2=|dely 
dely Zante 


e1x>|dely|-delx? 
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br 
- 3NEGWARP: 


addr 


.3INIT1: 


~ 3INIT2: 


-3INITS: 


- IMAINLOOP: 


# T15 


cmpqd 
ble 
subd 
addd 
movd 
sbitps 
movd 


. 2OCTANT2 
$ (9) ,r6 

. 3NEGWARP 
WARP, -4 (fp) 
.3INIT1 


-WARP, ~4 (fp) 


r5,xr3 
r4,r5 
r5,xr3 
r3,r@ 
$-1,rp 
r3,r2 
r5,r2 
r2,r4 
r4,r2 
$2,x3 
-3INIT2 
r5,r2 


$1,r2 
-3INIT3 


$1,-8(fp) 


_bit_map,rg 
-4(fp) ,r3 


$1,r1 
r4,r4 
tos,r2 
$1,r2 
r5,tos 
£5,255 
r4,rxr7 
¥r5,277 
$() , x6 
-3INITS 
$1,xr7 


$1,tos 


$9,8(sp) 
. 3LASTRUN 


$(6) ,x7 
.- 3CASE2 
$1,r2 
r4,r7 
r2,tos 


tos,r2 


SSM Bi A 8 SE I SE 


Se 


SEAESEAESE EOE OE HEHE FE HEE HEHEHE HEHEHE HEE HEHE 


if no, start octantl else octant2 
dely>$? 


pos slope then warp=positive 


warp=negative for neg slope 
calculate rotated parameters 


dela=|dely| 
delb=delx 

dela in r4 
dela/delb=q 
calc m 

m=q/2 

calc r 

delb*g 
r=dela-delb*q 
push r on stack 


then n=r 
n=r+delb 


pop n 
push q on stack 
r2=m=hg 


set one extra bit for smoothness 


mem=m=hpartb 
n=$? 


dely>9? 
hg=m-1 


hpartb=m-1 


set reg's for sbits 
warp=r3 h@=r2 bit=rl 


draw first run length 
update bit in x direction 


2*r 
q=h=next run length 


set extra bit for smoothness 


push delb=count 
delb*2 

n=n+2*r 
testvar=n+2*r+delb*2 
dely>g 


testvar-1 


count=count-1 
count=9? 


Bresenham slice algorithm 


testvar>$? 


h=q-1 
testvar=testvart2*r 
preserve h 


draw vert line of length h 


renew h 
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addqd $1,r1 # 
addd $1,r2 # 
subd $1,tos # 
cmpqd $2,9 (sp) + 
blt - 3MAINLOOP 
-align 4 

- 3LASTRUN: 
cmpqd $(%) , tos # 
mova ~8(fp) ,r2 # 
sbitps # 
exit a a ae 
ret (9) 
-align 4 

- JCASE2: 
adda r4,r7 # 
subd r5,r7 # 
movd r2,tos # 
sbitps # 
movd tos, r2 # 
addqd $1,r1 # 
subd $(1) ,tos # 
cmpqd $2, (sp) # 
_blt . 3SMAINLOOP 
cmpqd $() , tos # 
movd -8(fp),r2 4 
sbitps # 
exit r3,r4,r5,r6,r7] 
ret (9) 
-align 4 

. 2OCTANT2: # 
cmpqd $(f) ,x6 # 
bgt . 4NEGWARP 
addr WARP, -4 (fp) # 
br -4INITL 

- 4NEGWARP: 
addr ~WARP, ~—4 (fp) # 

~-4INIT1: # 
movd r5,x3 # 
movd r5,r4 # 
movd r2,xr5 # 
quow r5,xr3 # 
mova r3,rp # 
ashd $(-1) ,xrp # 
movd r3,r2 # 
mulw r5,r2 # 
subd r2,r4 # 
mova r4,r2 # 
tbhitb $9,x3 
bfc -4INIT2 # 
addd r5,r2 # 
ealign 4 

~-4INIT2: 
mova r2,xr7 # 
movd r3, tos # 
mova Y¥Q,x2 # 
addqd $1,r2 # 
movd rg,-8(fp) # 
cmpqd $(P) ,r7 # 
bne -4INIT3 
cmpqd $(9) ,r6 # 

lt -4INIT4 

subd $1,r2 # 
br -4INIT3 

-4INIT4: 
subd $1,-8(fp) # 

-4INIT3: 


update bit in x direction 
exit h to q 

count=count-1 

count=9? 


pop stack 
partb=last run length 
all other reg's set up 


testvar=testvar+2*r 
testvar=testvar+2*r-2%*delb 
preserve h 

draw line of length h=q 
renew h 

update bit in x direction 
update count 

count=$? 


pop stack 
2 pghedtidag run length 
all other reg's set up 


draw line in octant 2 
dely>$? 


pos slope then warp=positive 


warp=negative for neg slope 
calculate parameters 
dela=delx 

dela into r4 
delb=delx-|dely| 
dela/delb=q 

calc m 

m=q/2 

calc r 

delb*q 

r=dela-delb*gq 

push r on stack 


then n=r 
=r+delb 


pop n 

push q on stack 

r2=m=h9 

set one extra bit for smoothness 
mem=m=hpartb 

n=9? 


dely>$? 
hg=m-1 


hpartb=m-1 
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# T16 
# T17 


-4INITS: 


- 4MAINLOOP: 


# T18 


cmpqd 
ble 
subd 
addd 
movd 
sbitps 
movd 
subd 
addd 
subd 


cmpqd 
blt 


ealign 4 


-4LASTRUN: 


_bit_map,rg 
-4(fp) ,xr3 
$1,7r3 


$1,r1 
r4,r4 
tos ,r2 
$1,r2 
r5,tos 
r5,x5 
r4,r7 
r5,x7 
$(f) , x6 
-4INITS5 
$1,r7 


$1,tos 


$2, 2(SP) 
. 4LASTRUN 


$(9) ,r7 
. 4CASE2 
$1,r2 
r4,r7 
r2,tos 


tos,r2 
$1,r1 
$1,r2 
$1,tos 

$2, 2(sp) 

- 4MAINLOOP 


SE AEMESESEBENESESE OE OEE EHEC MESES SEER SESE SE OE OE ETE 


$ (2) , tos # 
~8(fp),r2 # 
$1,r2 
# 
] 


a ad as 
(9) 


r4,r7 
r5,r7 
r2,tos 


Sb: 56 He SH SEE SHE SEE 


- 4MAINLOOP 


$(9),tos # 
-8(fp) ,r2 # 
$1,r2 
# 
J 


r3,r4,r5,r6,r7 


(9) 


set reg's for sbits 
warp=r3 hf§=r2 bit=rl 
octant 2 needs diag runs 


draw first run length 
update bit 


2*r 
q=h=next run length 


set extra bit for smoothness 


push delb=count 
delb*2 

n=n+2*r 
testvar=n+2*r+delb*2 
dely>g 


testvar-1 


count=count-1 
count=$? 


Bresenham slice algorithm 
testvar>g? 


h=q-1 

testvar=testvart+2%*r 
preserve h 

draw diag line of length h 
renew h 

sbitps adds one warp extra 
exit h to q 

count=count-1 

count=9? 


pop stack 

hpartb=last run length 

all other reg's set up 
testvar=testvart+2*r 
testvar=testvart+2*r-2%*delb 
preserve h 

draw line of length h=q 
renew h 

sbitps adds one warp extra 
update count 

count=9? 

pop stack 


hpartb=last run length 


all other reg's set up 


112 


TL/EE/9663-19 


: BIGSET.S uses MOVMP and the OR instructions to set long horizontal lines 


bigset: 


-globl 
save 
mova 
ashd 
addd 
andd 


bigset 
{r$,r1,r2,r3,r4,r5,r6] 
ri1,r4 

$-3,r4 

r4,rg 

$7,r1 


#save registers we will affect 

#get current bit offset 

#divide by eight to get byte offset 
#add in base. rf is new base pointer 
#mask off msb's of bit pointer to 
#get bit = bit offset mod 8 


#Now we have true base address and bit offset within base. Now we will move 
#to double word alignment. This speeds up the MOVMPD for long bit sequences. 


mvM: 


shrt: 


shrtil: 


restore 
ret 


-align 


restore 
ret 


3,r4 
rg,r4 
$3,r4 
$3,r4 
r1,r4 
r4,r2 
shrt 
$32,r4 


mvm 

r1,r5 

$5,xr5 

r4,r5 

r3(r5:d] ,f(rp) 
$3,r9 


4,x6 

r4,r2 

r3,xr5 
$-5,r2 

1928 (r3) ,x3 
4,r1 


$ox1f,r4 
r5[r4:a],f(rp) 
Ear errr are 


4 

$32,r2 

shrtl 

r1,r4 

$5,r4 

r2,r4 

r3(r4:a},9(r#) 
ania a a 


4 
1929 (x3) ,9(rP) 
Apne een eee 


#place mask in r4 

#get low two bits of address 

#and get bytes left to alignment 

#rem += 1 (for the byte we are on) 
#rem *= 8 to get bits to alignment 
#subtract current bit offset 

#is this more than number of bits left 
#it is, do it the short way 

#if we are already double aligned, go 
#do the MOVMPD 


#calculate index into table 
#index = 32 * bit offset 

#index += run length 

#or in required bits 

#clear last two bits, and 

#bump to next double 

#zap sp'd bits off 

#save run length for a minute 
#and save pointer to table 

#r1 = r1 / 32 = number of doubles 
#get source pattern from table 
#increment is r1 . 
# yes, use instruction 

#mask off all but last 32 bits 
#insert the last few bits 
#restore saved registers 


#check to see if it is exactly 
#32 bits. If it is, branch. 
#calculate index into table 
#index = 32 * bit offset 
#index += run length 

#or in required bits 

#restore saved registers 


#copy last entry of table 
#(all 32 bits) and restore 
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/* Program driver.c feeds line vectors to LINE_DRAW.S forming Star-Burst. 


#include <stdio.h> 
#define xbytes 259 
#define maxx 1999 
#define maxy 1999 


unsigned char bit_map[xbytes*maxy]; 


int i,count; 
/* generate Star-Burst image */ 
for (count=1;count<=1999;test++) { 
for (i=9;i<=maxy;i+=25) 
draw_line(#,i,maxx,maxy~-i) ; 


for (i=@;i<=maxx;1+=25) 
draw_line(i,maxy,maxx-i, 9) ; 


/* Start timer and call main procedure of DRIVER.C to draw lines */ 


start() { 
long *timer = (long *) §x699; 
*timer = 9; /* write a zero to timer location */ 


main($,f); /* Show arge as zero, argv ->f */ 
return (*timer) ; /* return, in rf, the current time */ 
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Drawing Circles with the 
NS32CG 16; 
NS32CG16 Note 1 


1.0 INTRODUCTION 


The NS32CG16 is a 32-bit CMOS, graphics oriented proc- 
essor. It is software compatible with other Series 32000® 
CPUs, with new instructions for high-speed graphics. The 
NS32CG16 is designed specifically for page-oriented print- 
ing technologies such as laser, LCS, LED, lon-Deposition, 
and Ink Jet. | 


In this applications note, a method for high-speed circle 
generation will be described, using an optimized version of 
Bresenham’s circle algorithm. 


2.0 DESCRIPTION 


A circle can be described by the center coordinates (xc, yc), 
the radius (r), and the width (w). With the Pythagorean theo- 
rem, pixels along the path described by the equation: 


(x — xc)? + (y — yc)? = r? 
can be set for a width of w perpendicular to the tangent of 
the arc. 


This, however, involves substantial computation for each 
point on the line. Even taking advantage of the symmetry of 
circles, a large number of instructions must be executed to 
calculate the path. 


Bresenham’s circle algorithm works by determining which of 
two pixels are nearer the actual circle at each step. Then, 
using symmetry, eight points on the circle’s path can be 
determined. Applying the width (w) to each of these eight 
points yields a displayed (or imaged) circle. For the actual 
derivation of Bresenham’s algorithm, see Reference 7, and 
Reference 2. This derivation was done by J. Michener. 


Bresenham’s algorithm can be implemented in the following 

manner: | 

1. Select the first position for display as 

(x4, ¥1) = (O,r) 
2. Calculate the first parameter as 
py = 3-a2r 
If py < 0, the next position is (x; + 1, yz). Otherwise, the 
next position is (x4 + 1, yz — 1). 

3. Continue to increment the x coordinate by unit steps, and 
calculate each succeeding parameter p from the preced- 
ing one. If for the previous parameter we found that pj < 
0 then 

Pi+1 = Pit 44, + 6 
Otherwise (for p; = 0), 
Pi+1 = Pi + 4%) — yi) + 10 
Then, if pj+4 < 0 the next point selected is (x; + 2, yj +4). 
Otherwise, the next point is (x; + 2, yj44 — 1). The y 
coordinate is yj+4 = y; if pj < Ooryj+4 = yj — 1, if pj = 
0. 

. Repeat the procedures in step 3 until the x and y coordi- 
nates are equal. 

3.0 IMPLEMENTATION 


With the path of the circle described, the pixels along the 
path can be set using the basic symmetry of the circle. Fol- 
lowing is an example of Bresenham’s circle algorithm in the 
C language, based on Michener’s derivation. 
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circle(xc, yc, radius, width) 
register unsigned int xc,yc,radius,width; 


{ 
register int y, x, Pp; 
x = 0; 
y = radius; 
p= 3- 2 * radius; 
while (x < y) { 
setgrp(xc, yc,x,y, width); 
if (p < 0) 
p+=4*x +6; 
else { 
p t= 4 * (x - y) + 10; 
y--; 
} 
x++; 
} 
if (y == x) 
setgrp(xc,yc,x,y, width); 
} 


TL/EE/9664—1 


setgrp(xc, yc,x,y width) 
register int xc,yc,x,y,width; 


if ((y - x) <= (width / 2) { 


hset (xc + y, yc + x,width); 
hset(xc - y, yc + x,width); 
hset(xc + y, yo - x,width); 
hset(xc - y, yc - x,width); 
vset(xc + x, yc + y,width); 
vset(xc - x, yo + y,width); 
vset(xc + x, yo - y,width); 
vset(xc - x, yo - y,width); 
} 
vset(xc + y, yo + x,width); 
vset(xc - y, yc + x,width); 
vset(xc + y, yo - x,width); 
vset(xc - y, yc - x,width); 
hset(xc + x, yo + y,width); 
hset(xc - x, yc + y,width); 


hset(xc + x, yc - y,width); 
hset(xc - x, yo - y,width); 


} TL/EE/9664-2 


The sefgrp routine in the previous example uses symmetry 
to set eight points of the circle. Seigrp has a special case to 
handle the boundaries of the eight sections. When the dis- 
tance between the boundaries is less than half the width of 
the circle, both vertical and horizontal lines are imaged for 
each section. The vset routine sets width pixels vertically in 
the image, centered around the second argument. The Aset 
routine sets width pixels horizontally, centered around the 
first argument. Since these cases are so well defined, the 
NS32CG16 instructions SB/7PS and SB/TS are used for 
these routines. 


The NS32CG16 implementation is very much like the C ver- 
sion, but is optimized for speed. Note the use of the ADDR 
instruction to do the two p; computations, each in one line of 
32000 assembly code. 
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.data 

xwarp: equ 2544 #bits of xwarp to get to next scan 
.comm _page,4 

hifwdth:double 0 
. text 


# 

#Bresenham's circle algorithm, as expressed in "Computer Graphics” by 
#Donald Hearn and M. Pauline Baker (1986, Prentice-Hall, 

#ISBN 0-13-165382-2) 


# 
# Inputs: 
# r0 = x coodinate of centre of circle 
# rl = y coodinate of centre of circle 
# r2 = width (in pixels) 
# r3 = radius (in pixels) 
# 
# Outputs: 
# no registers altered 
# circle drawn in ram 
# 
#Notes: 
# This routine uses two special case line drawing routines: 
# a horizontal case (called HLINE) 
# a vertical case (called VLINE) 
f A general purpose line drawing algorithm could be used, however 
# the new 32CG16 instructions are much faster. 
# If the line is to have a width of > 25 pixels, the BIGSET algorithm 
# must be added to the HLINE routine. No other changes are required. 
# 
circle: save (r4,r5,r6,r7]_ ‘#save our working registers 
movd r2,r7 #get current width 
Ishd = $-1, 17 fdivide by two 
movd r7 ,hl fwdth fand store it away 
movqd 0,r4 #x1 = 0 
movd r3,r5 #yl = radius 
movqd 3,r6 fp = 3 - (radius * 2) 
subd r3,r6 
subd r3,r6 
br cirtest 
align 4 
cirlp: bsr setgrp fset a group of points 
cmpqd 0,r6 #is P less than zero? 
bit pged #no, it is not. skip 
addr =«-_—«@ (r6) (r4:d],r6 #p += 4* x1 +6 
addqd 1,r4 #x1 ++ 
cirtest:cmpd r4,r5 fis xl <2 yl ? 
ble cirlp #it is. Loop 
br cirotl 
-align 4 
pge0: movd r4,r7 #t = xl 
subd r5,r7 #t = xl - yl 
addr =s-_—«10(r6) [r7:d] ,r6 #p += 4 * (xi - yl) + 10 
addqad = -1,r5 #yl -- 
addqad i1,r4 #x1 ++ 
cmpd r4,r5 fis xl <= yl 7? 
ble cirlp #it is. Loop 
cirotl: bne cirout  #if xl f= yl, get out 
bsr setgrp felse set last group 
cirout: restore [r4,r5,r6,r7]) | #restore working registers 
ret 0 #fand return 


# 

#Setgrp sets eight points on a circle, given starting x and y, and the 
#current xoffset and y offset. 
# 

# Inputs: 

# r0 = centerpoint of circle (x coodinate) 
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ea eee’ SD GD D 


setgrp: 


movd 


bsr 


movd 


addd 


movd 


addd 
subd 
bsr 


movd 
subd 
addd 
bsr 

movd 
movd 
subd 


rl = centerpoint of circle (y coodinate) 


r2 = line width 
r4 = x offset 
r5 = y offset 


all registers preserved. 


4 
r6,tos #get two temporary values 
r7 ,tos 

r0,r6 #save old x 
ri,r7 fand y 

r5,rl 

r4,rl fri = (yl - x1) 
rl hi fwdth #if the difference is less than 
sgl:w fhalf the width, fill in the edges 
r7,rl frestore y 
r4,r0 #x += xl 

r5.rl fy += yl 

viine #do a viine 
r6,r0 frestore x and y 
r7,rl 

r4,r0 #x += x] 

r5,rl fy -= yl 

vline 

r6,r0 frestore x and y 
r7,ri 

r4,r0 #x -= x] 

r5,r1 oy +2 yl 

viine 

r6,r0 #restore x and y 
17,71 

r4,r0 #x -= xl 

r5,rl fy -= yl 

vline 

r6,r0 frestore x and y 
r7 rl 

r5,r0 #x += yl 

r4.ri fy += xl 

hline 

r6,r0 frestore x and y 
r7,rl 

r5,r0 #x += yl 

r4,ri fy -= xl 

hline 

r6,r0 #frestore x and y 
r7,rl 

r5,r0 #x -= yl 

r4,ri fy += xi 

hline 

r6,r0 #restore x and y 
r7,v1 

r5,r0 #x -= yl 
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sgl: 


subd 
bsr 

movd 
movd 
movd 
movd 
ret 


movd 
addd 
addd 
bsr 
bsr 
movd 
movd 
addd 
subd 
bsr 
bsr 
movd 
movd 
subd 
addd 
bsr 
bsr 
movd 
movd 
subd 
subd 
bsr 
bsr 


movd 
movd 
addd 
addd 
bsr 

bsr 

movd 
movd 
addd 
subd 
bsr 

bsr 

movd 
movd 
subd 
addd 
bsr 

bsr 

movd 
movd 


ma abaol 
ouMmu 


subd 
bsr 


r4,ri 
hline 
r6,r0 
r7,rl 
tos,r7 
tos, ré 


#y -= xi 
frestore x and y 


fand unstack 


frestore y 

fx += x] 

fy += yl 

fdo a hline 
fand a viine 
frestore x and y 


#x += x] 
#y -= yl 
frestore x and y 
#x -= xl 
#y += yl 
frestore x and y 


#x -= x] 
fy -= yl 


frestore x and y 


#x +2 yl 
fy += xl 


frestore x and y 


ox +2 yl 


fy -= xl 


frestore x and y 
fx -= yl 
#y += x] 
#frestore x and y 


#X -= yl 
fy -= xl 
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bsr hline 
movd r6,r0 #restore x and y 
movd v7,rl 
movd tos, r7 #fand unstack 
movd tos, r6 
ret 0 
# 
#A vertical line drawing algorithm, making use of the SBITPS instruction. 
f 
# Inputs: 
# r0 = x coodinate of line 
# rl = centerpoint of y coordinate of line 
# r2 = line length 
$ 
# Outputs: 
# no registers altered. 
# line drawn in memory. 
# 
.align 4 


vline: save {rO,rl.r2,r3] #save working registers 
subd hl fwdth, rl fy -= half of width to centre vline 
addr @(xwarp-1),r3.f#r3 = xwarp -1 


indexd rl,r3,r0 #bit off = y * (xwarp) + x 
addqd 1,r3 #move to correct warp value 
movd _page, r0 #page address in r0 
SBITPS #set bit perpendicular string 
restore [r0,rl,r2,r3}] #restore registers 
ret 0 

§ 

#A horizontal line drawing algorithn, using SBITS. 

# 

# Inputs: 

# r0 = centerpoint of x coordinate 

# rl = y coodinate of line 

# r2 = line length 

# 
-align 4 

hline: save (rO,r1,r3} #save working registers 


subd hl fwodth, r0 #x -= half of width to centre values 
indexd rl,(xwarp - 1),r0 # bit off = (y * xwarp) + x 


movd _page,r0 f#page address in r0 
addr stab,r3 faddress of sbits table 
SBITS 

restore [r0,r1,r3] 

ret 0 
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Figure 1 shows this algorithm ‘at work’. 20 circles of radius 
350 pixels, and widths of 1 to 20 pixels are shown. A full 


listing of this test program is shown in Figure 2. 


4.0 TIMING 


The execution speed of this algorithm is dependent on the 
radius of the circle, and the circle’s width. The test program 


supplied executes in 2.92 seconds on a NS32016 at 
10 MHz with no wait states. The execution time on the 
NS32CG16 at 15 MHz with no wait states is 1.54 seconds. 
By using macros for the VLINE and HLINE routines, instead 
of subroutine calls, the time can be further reduced to 1.39 
seconds. 


——_ 
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data 


-set xwarp, 2544 #bits of xwarp to get to next scan 
com _page,4 


hl fwdth: .double 9 


text 


# Test is aC - callable function that creates Figure 1. 


# 


_test: 


Ip: 


# 


-globl test 

save (r3,r4,r5,r6,r7) 

addr 2409, rP #start at x=49p 

addr 0499, r1 # y=499 

movad 1,2 #width = 1 

addr 2359,r3 #radius = 359 

addr 228,r7 we want to do 28 circles 
bsr circle #do a circle 

addr 88(rB),rP #x += 89 

addqd 1,r2 #width += 1 

acbd -1,r7,lp #loop for all 29 circles 
restore [(r3,r4,r5,r6,r7] 

ret ') #and return 


#Bresenham's circle algorithm, as expressed in "Computer Graphics" by 
#Donald Hearn and M. Pauline Baker (1986, Prentice-Hall, 
#ISBN 9-13-165382-2) 


# 


# 
# 
# 
# 
# 
# 
# 
# 
# 
# 
# 


otes: 
# 
# 
# 
# 
# 
# 
# 
# 
c 


ircle: 


Inputs: 

x coodinate of centre of circle 
y coodinate of centre of circle 
width (in pixels) 

radius (in pixels) 


ber | 
—_2 
if not 


Outputs: 
no registers altered 
circle drawn in ram 


This routine uses two special case line drawing routines: 
a horizontal case (called HLINE) 
a vertical case (called VLINE) 
A general purpose line drawing algorithm could be used, however 
the new 32CG16 instructions are much faster. 
If the line is to have a width of > 25 pixels, the BIGSET algorithm 
must be added to the HLINE routine. No other changes are required. 


save [r4,r5,r6,r7] #save our working registers 
movd r2,r7 #get current width 


FIGURE 2 
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tshd $-1,r7 #divide by two 
movd r7, hl fwdth #and store it away 
movad §,r4 #x1 = 9 
movd r3,r5 #y1 = radius 
movad 3,°r6 #p = 3 - (radius * 2) 
subd r3,r6 
subd r3,r6 
br cirtest 
align 4 

cirlp: bsr setgrp #¥set a group of points 
cmpad 9, r6 #is P less than zero? 
bit pgep tno, it is not. skip 
addr 6(r6)(r4:d) ,r6 #p += 4 * x1 + 6 
addad 1,14 #x1 ++ 

cirtest:cmpd r4,r5 #is x1 <= yl ? 
ble cirlp #it is. Loop 
br cirout 
align 4 

pgeB: movd r4,r7 #t = xi 
subd r5,r7 #t = x1 - y!1 
addr 1B(r6)(r7:d) ,r6 #p += 4 * (x1 - yl) + 19 
addqd = -1,r5 #y! -- 
addad =1,1r4 #x1 ++ 
cmpd r4,r5 #is x1 <= yi ? 
ble cirlp #it is. Loop 

cirout: restore [r4,r5,r6,r7] #restore working registers 
ret 9 #and return 


# 


#Setgrp sets eight points on a circle, given starting x and y, and the 


#current xoffset and y offset. 


setgrp: movd r6,tos 
movd r7,tos 
movd r?,r6 


# 

# Inputs: 

# 

F} = 

# r2 = line width 
# r4 = x offset 
# r5 = y offset 
# 

# Ouputs: 

# 

# 


r® = centerpoint of circle (x coodinate) 
ri = centerpoint of circle (y coodinate) 


all registers preserved. 


#¥get two temporary values 


#save old x 
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movd 
movd 
subd 
cmpd 
ble 

movd 
addd 
addd 
bsr 

movd 
movd 
addd 
subd 
bsr 

movd 
movd 
subd 
addd 
bsr 

movd 
movd 
subd 
subd 
bsr 


movd 
movd 
addd 
addd 
bsr 

movd 
movd 
addd 
subd 
bsr 

movd 
movd 
subd 
addd 
bsr 

movd 
movd 
subd 
subd 
bsr 

movd 
movd 
movd 


ri,r7 
r5,ri 
r4,r1 
ri, hifwdth 
sgi 
r7,r1 
r4,r9 
r5,r1 
vline 
r6,r@ 
r7,ri 
r4,rQ 
r5,ri 
vline 
r6,r0 
r7,r 
r4,r8 
r5,r1 
viine 
r6,r9 
r7,ri 
r4,r8 
r5,r1 
vline 


r6,r9 
r7,e1 
r5,r8 
r4r1 
hline 
r6,r0 
r7,ri 
r5,rp 
r4,ri 
hline 
r6,r8 
r7,ri 
r5,r8 
r4,ri 
hline 
r6,r9 
r7,r1 
r5,rP 
r4,r1 
hline 
r6,r9 
r7,r1 
tos,r/ 


#and y 


#ri = (yi - x1) 
#if the difference is less than 
#half the width, fill in the edges 


#restore 
#x += x1 


ty += y1 


#tdo a vline 


Y 


#restore x and y 


#x += x1 


Wy -= y!1 
#restore 


#x -= x1 


#y += yl 
#restore 
#x -= x1 
#y -= yl 
#restore 


#x += yl 
#y += x1 


#restore 


#x += y1 
Hy -= x! 


#restore 


#x -= yl 
#y += x1 


#restore 


#x -= yl 
fy -= x1 


x 


x 


x 


x 


x 


x 


and y 


and y 


and y 


and y 


and y 


and y 


#restore x and y 


#and unstack 
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sg: 


#restore 
#x += xi 
Hy += y! 


y 


#do a hline 


wang @ Viine 


#restore 
#x += x1 
fy -= yl 
#restore 
#x -= xi 
#y +2 yl 
#restore 


#x -= xi 


ty -z yl 


#restore 
#x += y! 
#y += x1 
#restore 
#x += yi 
Hy -= xi 
#restore 
#x -= yi 
#y += xi 
#restore 


#x -= y! 


x and y 


x and y 


x and y 


x and y 


x and y 


x and y 


x and y 
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Subd r4,ri Hy -= x! 
bsr vline 
bsr hline 
movd r6,r9 #restore x and y 
movd r7,ri 
movd tos,r7 #and unstack 
movd tos,ré 
ret 9 
# 
#A vertical line drawing algorithm, making use of the SBITPS instruction. 
# 
# Inputs: 
# r® = x coodinate of line 
# r1 = centerpoint of y coordinate of line 
# r2 = line length 
# 
# Outputs: 
# no registers altered. 
# tine drawn in memory. 
# 
align 4 
vline: save {rP,ri,r2,r3] #save working registers 
subd hifwdth,ri #y -= half of width to centre vline 
addr Q(xwarp-1),r3  #r3 = xwarp -1 
indexd r1,r3,rQp Mbit off = y * (xwarp) + x 
addad = 1,r3 #move to correct warp value 
movd _page,r8 #page address in r@ 
# SBITPS | #set bit perpendicular string 


# - Start of SBITPS emulation code 


align 4 

sblp: sbitd r1,8(r9) #set required bit 
addd r3,r1 #add the bit warp 
achd -1,r2,sblp #loop for the rill 


# - End of SBITPS emulation code 
restore [r@,r1,r2,r3] #restore registers 


ret '] 
# 
#A horizontal line drawing algorithm, using SBITS. 
# 
# Inputs: 
# r? = centerpoint of x coordinate 
# ri = y coodinate of line 
# r2 = line length 
# 
align 4 
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hline: save (r@,ri1,r3) #save working registers 
subd hi fudth, r9 #x -= half of width to centre values 
indexd r1,$(xwarp - 1),r8 # bit off = (y * xwarp) + x 
movd _page, rp #page address in rB 
# addr stab, r3 | Waddress of sbits table 
# SBITS 
# - start of SBITS emulation code 
movad 7,r3 
andd r1,r3 
addd r3,r3 #2 
addd r3,r3 oH" G 
addd r3,r3. #* 8 
addd r3,r3. 0 #* 16 
eddd r3,r3 -#* 32 
addd r2,r3 
ashd $-3,r1 


stab: 


ord stab(r3:d) ,9(rB) [risb) 
of SBITS emulation code 
restore [r@,r1,r3] 

ret 


data 

-double h'999G0008,h' 99990001 ,.h' s9pp—B83,h' a9ggNN87 
-double h'SB8R000f ,h'9999081F ,h BaRRBASF, h' seRpeB7F 
-double h' BORR9RFT ,h' SBBORIFTF hh ORDRRSEF ,h' OPBIOTFF 
-double h'PPBRST TT ,h'SOBBITTT h* PODAST FF, h' PORRT7EFF 
-double h'OPRDFFFF h'OBDIFEFE h*OOBSEFFF h POB7FFFF 
double h'OOBFFITT H'BBIFFFFT N'OB3TTFFTF hh OO7FFTFF 
double h'POttfftt h'PIFFTFFF h°*BStFFFTE hb DTEFFF FFE 
double h'Pttfft_ tf nh 1FFFFFFF Wi StFFFEFE hh 7TEFFFFEF 
double h'SP9P99R , h* PPPPRA2, h' 99999996, h' PpRPEPBe 
-double h' 899898 1e,h* PBPPBPSe, h* BPPOPP7e, hh OPRPPPTe 
double h*SPpB91 fe, h' PPPPPS fe, h* OGBVDD7 fe, h'POBORT fe 
double h'PPP01f fe, h' P9BB93SFfe,h*BIPA7F fe, h' POBOtT fe 
double h' PPB1fffe, h PPPs Fffe, h*SOR7FF fe, h' 99D fffe 
-double h'@P1ffffe,h Past fffe,h*OR7FFffe,h' BBffttfe 
double h' Biff fffe, h'a3tffffe, h*O7F fff fe, h' PFT ft_fe 
double h' 1fffff fe, h'3ftftffe,h' 7ffffffe,h' ftfttffe 
-double h' PPPPRPRR ,h' BPPPBAR<, ,h* PPPPRPPc ,h' PPPPPP 1c 
-double h'P999PP3c ,h* PPGRPD7c ,h* PPPPPPFc ,h' POPPI fc 
-double h' 999083 fc ,h' PPBBR7Fc,h' BEPOPF fc, h' GPO F fc 
double h' PPPS f fc, h' PBOB7F fc, h BPODFF fc, h'PBB1Fffc 
double h'@9O3fffc h' PBB7FFfc, h' BOBFFf fC, h'OBIffffc 
double h' POS fffc, h'PO7FFFfC, h' BBFFfFfc, h'BIFffF fc 
double h'P3fffffc h'O7f fff fc, h' BFF _C, h 1ffffffc 
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double h'3ffffffc,h'7ffffffc,h' fFfffffc,h' fff tte 
.double h' 98999900, h' 99989998, h'99999918,h' 9999038 
.double h'99999978,h' 9999908, h*999081f8,h' 9gpp93t8 
.double h'969997F8,h' 999—BtfS, h'pB1FFB,h' Bp3tt8 
double h'@BB97Ff8,h' 99OBFFF8,h PBDIFFFS, h' BBP3Ft FB 
.double h'PBG7FFf8,h' POBFTFF8, Wh OBIFFFFE,h' BB3ttttS 
double h'PO7FFFFS, h'BOFFFFFE WIOIFFFFER h'B3FF FEES 
double h'P7fffff—8,h'PEFFFFF8,h* TFFFEFER, Wh 3EFEFEES 
.double h'7ffffff8,h' FFfFfff8,h FFFFFFFB, Wh FFF EFFTS 
.double h'SS999908,h 99990018, h' spppRN3—,h' pBpgBR7p 
.double h' 8999909, h' 999901 f9,h' s999R3F—,h' pppgR7 TP 
.double h' B9PBRFf9,h'99901FF9,h PBBBSFFD, h' Bapa7Ft TD 
double h'PPOOFT TG, h"GOB1FFFD, h BBBSEFFR, h BROTEFFE 
double h'PBOFFT FP, h OOIFFFED, h OD3EFFED, h PO7EFFFE 
double h'@BFfff fe ,h'DIFFFFED, h B3FFFFFD, Wh O7FFFFFD 
double h'PFFFFFFD,h* 1FFFFFED, Hh SEFFEFED,W! 7EFEFFFD 
double h' FFF ffffP, h FFFFFFED Wh FFFFFFED W FFF FFF ED 
.double h' 99998908, h' 99999829, h' PRPBBR6A, h' spgBeReD 
.double h'829901e8,h'999293e9,h' PRBBP7eB, hh pPDBRteD 
double h' 99981 fef,h'P99R3te,h' PPOB7Fe—, h' BOOT TED 
double h'9981ffeB,h'99P3FfeP,h'PBB7F Fes, h' BOBtFFeD 
double h'P91fffep,h'P—3FffeB,h BA7FFFED, h BOTT feP 
.double h'@iffffe®, h'B3ffffeD,h' O7FFFFER,h' OFT ffeD 
double h'1fffffef,h'3tffffe@,h' 7FFFF FED, h' fff FF fe— 
double h' ffffffe®,h' ffffffes,h' fFFfffes,h' fFftf fe? 
.double h' BP999999, h'99999049,h* PPBPRACB,h' BpPB—ICB 
double h'999983cP,h'999907cB,h'PPBRRFcP,h' 99R91FcH 
double h'99993fc@,h'B9BP7FcP,h*PROBTFCB,h' PPB1 Ffcp 
.double h'Pp93ffcP,h' POB7FFcD,h BOBFFECD, h' OBI FF FCp 
double h'9g3fffcP,h'OO7FFFcD,h BBFFFFCD,h' OIF tf cp 
double h'p3ffffcP,h'O7Fff cp, h*BFFFFECD,h' If fff cp 
double h'3fffffc@,h'7#ffFfcB,h' FFFFFFCR,h' FFF tftp 
double h' fff fffcp,h' fff tttcB,h' FFFFFFCD,h' fff fftce 
.double h' 99999998, h'9999—R89,h' 99999189, h' spgga38B 
.double h' 99999789, h' 99998489 ,h'9BR01F8P,h' 9999389 
.double h*99997F89,h' pp9pFt89,h'99R1FFS9, h'B993F fap 
.double h'p997Ff89,h' Saft fE9,h*9p1FFFER, h p—3tttSp 
.double h'pP7fff89,h' OO ff 489, h' PIF EFFRD, h'O3444fGD 
.double h'97ffff8—, h'@tttff8,h' 1FFFFFED,h StF FF FED 
.double h'7fffFf8p,h' FfFFFFED,h’ FFFFFF8D, h’ FFF FE FSD 
double h' ff fF ffap,h' ffffFFEp,h' FEFFEFED,h' FFF TE 
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5.0 CONCLUSIONS 


The NS32CG16 provides several instructions that increase 
the speed of imaging common graphic items such as Cir- 
cles, lines, and ellipses. The NS32CG16’s high code densi- 
ty, and fast execution, make it ideal for intensive graphics 
processing. 

This algorithm does, however, show an apparent ‘thinning’ 


on the 45° boundaries, when the width of the circle is great- 
er than five pixels. An alternate algorithm will be presented 
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in a future applications note. This algorithm is optimized for 
speed. 
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Introduction to 
Bresenham’s Line 
Algorithm Using the SBIT 
Instruction; Series 32000® 
Graphics Note 5 


1.0 INTRODUCTION 


Even with today’s achievements in graphics technology, the 
resolution of computer graphics systems will never reach 
that of the real world. A true real line can never be drawn on 
a laser printer or CRT screen. There is no method of accu- 
rately printing all of the points on the continuous line 
described by the equation y = mx + b. Similarly, circles, 
ellipses and other geometrical shapes cannot truly be imple- 
mented by their theoretical definitions because the graphics 
system itself is discrete, not real or continuous. For that 
reason, there has been a tremendous amount of research 
and development in the area of discrete or raster mathemat- 
ics. Many algorithms have been developed which ‘‘map” 
real-world images into the discrete space of a raster device. 
Bresenham’s line-drawing algorithm (and its derivatives) is 
one of the most commonly used algorithms today for de- 
scribing a line on a raster device. The algorithm was first 
published in Bresenham’s 1965 article entitled ‘‘Algorithm 
for Computer Control of a Digital Plotter’. It is now widely 
used in graphics and electronic printing systems. This appli- 
cation note will describe the fundamental algorithm and 
show an implementation on National Semiconductor's Se- 
ries 32000 microprocessor using the SBIT instruction, which 
is particularly well-suited for such applications. A timing dia- 
gram can be found in Figure 8 at the end of the application 
note. 


2.0 DESCRIPTION 


Bresenham’s line-drawing algorithm uses an_ iterative 
scheme. A pixel is plotted at the starting coordinate of the 
line, and each iteration of the algorithm increments the pixel 
one unit along the major, or x-axis. The pixel is incremented 
along the minor, or y-axis, only when a decision variable 
(based on the slope of the line) changes sign. A key feature 
of the algorithm is that it requires only integer data and sim- 
ple arithmetic. This makes the algorithm very efficient and 
fast. 


x x i X= axis 
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The algorithm assumes the line has positive slope less than 
one, but a simple change of variables can modify the algo- 
rithm for any slope value. This will be detailed in section 2.2. 


2.1 Bresenham’s Algorithm for 0 < slope < 1 
Figure 7 shows a line segment superimposed on a raster 
grid with horizontal axis X and vertical axis Y. Note that x; 
and y; are the integer abscissa and ordinate respectively of 
each pixel location on the grid. 
Given (x, y;) as the previously plotted pixel location for the 
line segment, the next pixel to be plotted is either (x; + 1, yj) 
or (xj + 1, yj + 1). Bresenham’s algorithm determines 
which of these two pixel locations is nearer to the actual line 
by calculating the distance from each pixel to the line, and 
plotting that pixel with the smaller distance. Using the famil- 
iar equation of a straight line, y = mx + b, the y value 
corresponding to x; + 1 is 
y= mx + 1) +b 
The two distances are then calculated as: 
di=y-yj 
d1 = m(xjj + 1) + b—y; 
d2=(yj+1)—y 
d2 = (yj + 1) — mj + 1)—b 
and, 
d1 — d2 = m(xj + 1) + b — yj — (yy + 1) + my + 1) +b 
d1 — d2 = 2m(xj + 1) — 2y; + 2b — 1 
Multiplying this result by the constant dx, defined by the 
slope of the line m = dy/dx, the equation becomes: 
dx(d1-—d2) = 2dy(x;)) — 2dx(y;) + c 
where c is the constant 2dy + 2dxb — dx. Of course, if d2 
> d1, then (d1-—d2) < 0, or conversely if d1 > d2, then (d1- 
d2) > 0. Therefore, a parameter p; can be defined such that 
pj = dx(d1—d2) 

— 2dx(yj) + ¢ 


pj = 2dy(xi) 
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Distances d1 and d2 are compared. 
The smaller distance marks next pixel to be plotted. 
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If pj > 0, then d1 > d2 and y; + 4 is chosen such that the 
next plotted pixel is (x; + 1, yj). Otherwise, if pj < 0, then d2 
> d1 and (x; + 1, yj + 1) is plotted. (See Figure 2.) 


Similarly, for the next iteration, pj + 4 can be calculated and 
compared with zero to determine the next pixel to plot. If 
pj +1 < 0, then the next plotted pixel is at (xj + 4 + 1, 
yi + 1); if pj + 4 > 0, then the next point is (xj + 4 + 1, 
yi + 1 + 1). Note that in the equation for pj + 4,%} +4 = %; 
ae 


Pi+ 1 = 2dy(xj + 1) — 2dx(yj +1) + ¢ 
Subtracting p; from pj + +, we get the recursive equation: 
Pi +1 = Pi + 2dy — 2dx(yj + 1 — yi) 
Note that the constant c has conveniently dropped out of 


the formula. And, if pj < 0 then y; + 4 = yj; in the above 
equation, so that: 


Pi +1 = pj + 2dy 

or, if pj) > O then yj + 4 = yj; + 1, and 
Pi +1 = pj + 2(dy—dx) 

To further simplify the iterative algorithm, constants c1 and 
c2 can be initialized at the beginning of the program such 
that c1 = 2dy and c2 = 2(dy—dx). Thus, the actual meat of 
the algorithm is a loop of length dx, containing only a few 
integer additions and two compares (Figure 3). 


2.2 For Slope < 0 and |Slope| > 1 


The algorithm fails when the slope is negative or has abso- 
lute value greater than one (|dy| > |dx|). The reason for this 
is that the line will always be plotted with a positive slope if 
xj; and y; are always incremented in the positive direction, 
and the line will always be “shorted” if |dx|<|dy| since the 
algorithm executes once for every x coordinate (i.e., dx 
times). However, a closer look at the algorithm must be tak- 
en to reveal that a few simple changes of variables will take 
care of these special cases. 


For negative slopes, the change is simple. Instead of incre- 
menting the pixel along the positive direction (+1) for each 
iteration, the pixel is incremented in the negative direction. 
The relationship between the starting point and the finishing 
point of the line determines which axis is followed in the 
negative direction, and which is in the positive. Figure 4 
shows all the possible combinations for slopes and starting 
points, and their respective incremental directions along the 
X and Y axis. 


do while count < > dx 
if (p < 0) then p+ = 


else 


Another change of variables can be performed on the incre- 
mental values to accommodate those lines with slopes 
greater than 1 or less than — 1. The coordinate system con- 
taining the line is rotated 90 degrees so that the X-axis now 
becomes the Y-axis and vice versa. The algorithm is then 
performed on the rotated line according to the sign of its 
slope, as explained above. Whenever the current position is 
incremented along the X-axis in the rotated space, it is actu- 
ally incremented along the Y-axis in the original coordinate 
space. Similarly, an increment along the Y-axis in the rotat- 
ed space translates to an increment along the X-axis in the 
original space. Figure 4a., g. and A. illustrates this transla- 
tion process for both positive and negative lines with various 
starting points. 


3.0 IMPLEMENTATION IN C 


Bresenham’s algorithm is easily implemented in most pro- 
gramming languages. However, C is commonly used for 
many application programs today, especially in the graphics 
area. The Appendix gives an implementation of Bresen- 
ham’s algorithm in C. The C program was written and exe- 
cuted on a SYS32/20 system running UNIX on the 
NS32032 processor from National. A driver program, also 
written in C, passed to the function starting and ending 
points for each line to be drawn. Figure 6 shows the output 
on an HP laser jet of 160 unique lines of various slopes on a 
bit map of 2,000 x 2,000 pixels. Each line starts and ends 
exactly 25 pixels from the previous line. 


The program uses the variable bf to keep track of the cur- 
rent pixel position within the 2,000 x 2,000 bit map (Figure 
5). When the Bresenham algorithm requires the current po- 
sition to be incremented along the X-axis, the variable bit is 
incremented by either +1 or —1, depending on the sign of 
the slope. When the current position is incremented along 
the Y-axis (i.e., when p > 0) the variable b/t is incremented 
by + warp or —warp, where warp is the vertical bit displace- 
ment of the bit map. The constant /ast bit is compared with 
bit during each iteration to determine if the line is complete. 
This ensures that the line starts and finishes according to 
the coordinates passed to the function by the driver pro- 
gram. 


cl 


pt = c2 
next_y = previy + y.inc 


next _x = 


plot (next_x,next_y) 


count + = 1 


prev_x + x inc 


/* PSEUDO CODE FOR BRESENHAM LOOP */ 
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pi 


m =inf 


pi 


pt 


-1<m<0 


p2 


start p1: x_inc = y’_inc = 0 
y_inc = x’_inc = +1 

start p2: x_inc = y'_inc = 0 
y_inc = x’_inc = — 14 


start p1:x_inc = +1 


y—inc = —1 
start p2:x_inc = —1 
y_inc = +1 
p2 
Cc. 
start p1:x_inc = +1 
y_inc = —1 
start p2: x_inc = —1 
y_inc = +1 
p2 
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start p1:x_inc = y’__inc 
y__inc = x’__inc 
start p2: x_inc = y’__inc 
y__inc = x’__inc 


pl p2 
m=0 
b. 
p2 
m=1 
pi 
d. 


p2 
0<m<1 
p1 


f. 
p2 
m> 1 
pi 
| ee 
h 


Note: a., g., and h. are rotated 90 degrees left and x’, y’ refer to the original axis. 
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start p1:x_inc = +1 


y—inc = 0 
start p2:x_inc = —1 
y_inc = 0 


start p1: x_ine 
y__inc 
start p2: x_inc 
y__inc 


start p1: x__inc 
y__inc 
start p2: x_inc 
y__ine 
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= +1 


+1 
—1 


=-{ 
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start p1:x_inc = 
y—inc = 
start p2:x_inc = 
y__inc = 
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bit =0 


bit = starting 
position 


Bit Map is 500 kbytes, 2k x 2k Bits 
Base Address of Bit Map is ‘Bit__Map’ 


FIGURE 5 
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bit = 1,999 


» warp = 2,000 


bit = current 
positi 
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Graphics Image (2000 x 2000 Pixels), 300 DP! 


\ 77 


TL/EE/9665~12 
FIGURE 6. Star-Burst Benchmark—This Star-Burst image was done on a 2k x 2k pixel bit map. 
Each line is 2k pixels in length and passes through the center of the image, bisecting 
the square. The lines are 25 pixel units apart, and are drawn using the LINE__DRAW-.S routine. There 
are a total of 160 lines. The total time for drawing this Star-Burst is 2.9 sec on 10 MHz NS32C016. 
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4.0 IMPLEMENTATION IN SERIES 32000 ASSEMBLY: 
THE SBIT INSTRUCTION 
National’s Series 32000 family of processors is well-suited 
for the Bresenham’s algorithm because of the SBIT instruc- 
tion. Figure 7 shows a portion of the assembly version of the 
Bresenham algorithm illustrating the use of the SBIT instruc- 
tion. The first part of the loop, handles the algorithm for p < 
0 and .CASE2 handles the algorithm for p > 0. The main 
loop is unrolled in this manner to minimize unnecessary 
branches (compare loop structure of Figure 7 to Figure 3). 
The SBIT instruction is used to plot the current pixel in the 
line. 
The SBIT instruction uses bit_map as a base address from 
which it calculates the bit position to be set by adding the 
offset bit contained in register r1. For example, if b/t, or R1, 
contains 2,000*, then the instruction: 
sbitd r1,@ bit_map 
will set the bit at position 2,000, given that b/t_map is the 
memory location starting at bit O of this grid. In actuality, if 
base is a memory address, then the bit position set is: 
offset MOD 8 
within the memory byte whose address is: 
base + (offset DIV 8) 
So, for the above example, 
2,000 MOD 8 = 0 
bit_map + 2,000 DIV 8 = bit_map + 250 
Thus, bit 0 of byte (6/t_map + 250) is set. This bit corre- 
sponds to the first bit of the second row in Figure 5. 


*All numbers are in decimal. 


# Main loop of Bresenham algorithm 
LOOP: #p < O: move in x direction only 


cmpqd $0,r4 
ble ~CASE2 
addd ro,r4 
addd r5,rl 
Sbitd rl,@_bit_map 
cmpd ro, rl 
bne - LOOP 
exit [r3,r4,r5,r6,r7] 
ret $0 
align 4 
-CASE2: #P > O: move in x and y direction 
addd r2,r4 
addd r7,ri 
addd r5,rl 
sbitd r1l,@_bit_map 
cmpd rl,r3 
bne LOOP 
exit [r3,r4,r5,r6,r7] 
ret $0 


The SBIT instruction greatly increases the speed of the al- 
gorithm. Notice the method of setting the pixel in the C pro- 
gram given in the Appendix: 
bit_map|bit/8] | = bit_pos{(bit & 7)] 

This line of code contains a costly division and several other 
operations that are eliminated with the SBIT instruction. The 
SBIT instruction helps optimize the performance of the pro- 
gram. Notice also that the algorithm can be implemented 
using only 7 registers. This improves the speed perform- 
ance by avoiding time-consuming memory accesses. 


5.0 CONCLUSION 


An optimized Bresenham line-drawing algorithm has been 
presented using the SYS32/20 system. Both Series 32000 
assembly and C versions have been included. Figure 8 
presents the various timing results of the algorithm. Most of 
the optimization efforts have been concentrated in the main 
loop of the program, so the reader may spot other ways to 
optimize, especially in the set-up section of the algorithm. 


Several variations of the Bresenham algorithm have been 
developed. One particular variation from Bresenham himself 
relies on “run-length’” segments of the line for speed opti- 
mization. The algorithm is based on the original Bresenham 
algorithm, but uses the fact that typically the decision vari- 
able p has one sign for several iterations, changing only 
once in-between these “run-length’” segments to make one 
vertical step. Thus, most lines are composed of a series of 
horizontal ‘‘run-lengths” separated by a single vertical jump. 
(Consider the special cases where the slope of the line is 
exactly 1, the slope is 0 or the slope is infinity.) This algo- 
rithm will be explored in the NS32CG16 Graphics Note 5, 
AN-522, “Line Drawing with the NS32CG16”,, where it will 
be optimized using special instructions of the NS32CG16. 


Register and Memory 
Contents 


r0 cl constant 

rl bit current 
position 

r2 = c2 constant 

r3 = last bit 


r4 = p decision var 


r5 = x_ine increment 
r6 = unused register 
r7 y.ine increment 
~bit_map = address of 
first byte in bit map 


FIGURE 7 


Note: Instructions followed by the letter ‘d’ indicate “double word” operations. 
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Timing Performance 
2k x 2k Bit Map 
2k Pix/Vector 160 Lines per Star-Burst 


| Version | NS32000 Assembly with SBIT 
NS32C016-10 _NS32C016-15 
Set-up Time Per Vector 


Vectors/Sec re a 
Pixels/Sec 109,776 164,771 


Star-Burst Benchmark 
FIGURE 8 
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Set-up time per line is measured from the start of 
LINE_.DRAW.S only. The overhead of calling the LINE__ 
DRAW routine, starting the timer and creating the endpoints 
of the vector are not included in this time. Set-up time does 
include all register set-up and branching for the Bresenham 
algorithm up to the entry point of the main loop. 


Vectors/Second is determined by measuring the number 
of vectors per second the LINE__.DRAW routine can draw, 
not including the overhead of the DRIVER.C and START.C 
routines, which start the timer and calculate the vector end- 
points. All set-up of registers and branching for the Bresen- 
ham algorithm are included. 


Pixels/Second is measured by dividing the Vectors/Sec- 
ond value by the number of pixels per line. 


Total Time for the Star-Burst benchmark is measured from 
start of benchmark to end. It does include all overhead of 
START.C and ODRIVER.C- and_ all _ set-up for 
LINE_.DRAW.S. This number can be used to approximate 
the number of pages per second for printing the whole Star- 
Burst image. 


-file 


National Semiconductor Corporation. 
CTP version 2.4 


"line _draw.s" 


-comm bit map, 499759 


_line_draw: 


-LL2: 
-LL3: 


-LL4: 
- LL5: 


-LL1: 


- LL7: 
»-LL8: 


- LL9: 


-LL19: 


- LL6: 


~LL11: 


-gQlobl” line draw 
.set WARP , 2699 
-align 4 

enter {r3,r4,r5,r6,r7],12 
mova 12(fp),r5 
mova 8(fp) ,ré 
movd r5,r1 

muld $ (WARP) ,r1 
addd r6,r1 
mova 29(fp),r4 
subd r5,r4 

absd r4,rxr3 
mova 16(fp) ,r2 
subd r6,xr2 

absa r2,rx6 

cmpd r3,r6 

ble -LL1 

cmpqd $(f),r4 
bge - LL2 

addr WARP, r5 
br | - LL3 
-align 4 

addr “WARP, r5 
cmpqd $(9) ,r2 
bge - LL4 
movqd $(1) ,r7 
br - LLS5 
-align 4 

movqd $(-1),xr7 
mova r6,xrp 
addd rg,rp 
subd r3,r6 
addr P(r6ésw],r2 
movd rp,r4 

subd r3,r4 
movd 29(fp),xr3 
muld $ (WARP) , x3 
addd 16(fp) ,r3 
br - LL6 
-align 4 

cmpqda $(P) ,r4 
bge - LL7 

addr WARP, x7 
br - LL8 
-align 4 

addr -WARP, r7 
cmpqa $(f) ,r2 
bge - LLY 
movqd $(1),r5 
br -LL19 
-align 4 

movqd $(-1),xr5 
addr P(xr3:w],xrp 
movd r3,rxr2 

subd r6,xr2 

addd r2,xr2 

mova rg,r4 

subd r6,r4 

movd 26(fp) ,xr3 
muld $ (WARP) ,r3 
addd 16(fp),r3 
cmpqd $ (9) ,x4 
ble ~LL11 

addd rp,r4 

addd r5,r1 
sbitd r1,@ bit map 
cmpd r3,r1 

bne - LL6 

exit ina at aaa 
ret (9) 
ealign 4 

addd r2,r4 

addd r7,xri 

addd r5,r1 
sbitd r1,@ bit _map 
cmnpd r1,r3 

bne - LL6 

exit ia aa ala al 
ret (9) 


[Obs =Ohe Ns “She “Hs “She She Se Ae EE Se 


2 =H Se 
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-~ line draw.s -- 


initialize 


r5=ys 

r6=xs 

initialize starting 'bit' 
bit=warp*ys+xs 

ril=bit 

r4-yf 

r4=dy 

r3=|day| 

r2=xf 

r2=dx 

r6=|dx| 

branch if slope<1l 

must rotate axis for slope>l 
if dy<# want x_inc<g 

else x_inc is pos 
x_inc=+/-warp because of rotate 


if dx<fg want y inc<g 
else y_inc is pos 
y_inc=+/-1 becaue of rotate 


calculate cl1,c2 and p 
rf=cl=2*|dx| because of rotate 
r6=|dx-dy| r2=2*r6=c2 

this muls r6 by 2 and puts in r2 


r4=c2-|dy|=p in rotated space 
calculate last_bit 


r3=last_bit 


slope<i1 use original axis 
dy determines y_ inc 


dy>p then y_inc=+warp 


dy<f then y_inc=-warp 


ax>f then x_inc=+1 


ax<g then x_inc=-1 
TL/EE/9665-13 


calculate c1,c2,p 
rp=2*r3=cl1 


# r2=2%|dy-dx|=c2 


he =e 


3R Ate he He er 2: 


30s 2H aoe Oe EE 


p=2*dy-dx=r4 
calculate last_bit=r3 


main loop for algorithm 
check sign of p 

branch if pos 

add cl1 to p 

inc bit by x_inc only 
plot bit 

end only if bit=last_bit 


p>p then inc in y dir 

add c2 to p 

add y_inc to bit 

add x_inc to bit 

plot bit 

end only when bit=last_bit 
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/* This program calculates points on a line using Bresenham's iterative */ 


/* method. 

#include<stdio.h> 

#define xbytes 259 /* number of bytes along x-axis*/ 
#define warp xbytes * 8 /* number of bits along x_axis*/ 
#define maxy 1999 /* number of lines in y_axis*/ 
unsigned char bit _map[xbytes*maxy] ; /* array contains bit map*/ 
static unsigned char bit_pos[j=({1,2,4,8,16,32,64,128}; 

/* look-up table for setting bit */ 
line_draw(xs,ys,xf,yf) /* starting (s) and finishing (f) points */ 
int xs,ys,xf,yf; 

{ ; 
int dx,dy,x_inc,y_ inc, /* deltas and increments */ | 
bit, last_bit, /* current and last bit positions */ 
p,cl,c2; 7/* decision variable p and constants */ 
ax=xf-xs; 
dy=yf-ys; ; ; 
bit=(ys*warp)+xs; /* initialize bit to first bit pos */ 
last_bit=(yf*warp) +xf; /* calculate last bit on line */ 


if (abs(dy) > abs(dx) ) 
{ /* abs(slope)>1 must rotate space */ 
/* see Figure 5 a.,g.,and h. */ 


if (dy>) | 
‘ x_inc=warp; /* x_axis is now original y_ axis */ 
else 
x_inc= -warp; 
if (ax>Q) 
y_inc=1; /* y_axis is now original x_axis */ 
else 
y_inc= -1; 
Cc1=2*abs (dx) ; /* calculate Bresenham's constants */ 
c2=2* (abs (ax) ~abs (dy) ); 
p=2*abs (dx) ~abs (dy) ; /* p is decision variable now rotated */ 
else { /* abs(slope)<1 use original axis */ 
if (dy>$) 
: y_inc=warp; /* y_inc is +/-warp number of bits */ 
else 
y_inc= -warp; 
if (dx>9) 
x_inc=1; /* move forward one bit */ 
else 
x_inc= -1; /* or backward one bit */ 
cl=2*abs (dy) ;} /* calculate constants and p */ 


C2=2* (abs (dy) ~abs (dx) ) ; 
; p=2*abs (dy) -abs (ax) ;} 


/* Bresenham's Algorithm */ 
do /* ao once for each x increment, i.e. dx times */ 
{ 


if (p<f) /* no y movement if p<f */ 
pt=cl; 

else { 7* move in y dir if p>g */ 
pt=C2; 
bit+=y_inc; 

} 

bit+=x_inc; /* always increment x */ 


/* bit is set by calculating bit MOD 8, which is */ 
TL/EE/9665-15 


/* same as bit & 7, then looking up appropriate +#/ 
/* bit in table bit_pos. This bit pos is then set %/ 
/* in byte bit/8 */ 


bit_map[bit/8] |= bit_pos[ (bit&7) }; 
) while pitied eee Git) : _pos [ ( )} 


_w 
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/* Program driver.c feeds line vectors to LINE_DRAW.S forming Star-Burst. 
#include <stdio.h> 

#define xbytes 259 

#define maxx 1999 

#define maxy 1999 

unsigned char bit_map[(xbytes*maxy]; 


main () 


int i,count; 
/* generate Star-Burst image */ 
for (count=1;count<=1996 ;test++) { 
for (i=§;i<=maxy;i+=25) 
line <iraw (Ps P| Re re 


for (i=@;i<=maxx; +=25) 
line_draw(i, oS ane: B); 


/* Start timer and call main procedure of DRIVER.C to draw lines */ 


start() { 
long *timer = (long *) $x6$9; 
*timer = @; /* write a zero to timer location */ 
main(#,$); /* Show argc as zero, argv ->f */ 
return(*timer) ; /* return, in rg, the current time */ 
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Block Move Optimization 
Techniques Series 32000® 
Graphics Note 2 | 


1.0 INTRODUCTION 


This application note discusses fast methods of moving 
data in printer applications using the National Semiconduc- 
tor Series 32000. Typically this data is moved to or from the 
band of RAM representing a small portion (or slice) of the 
total image. The length of data is fixed. The controller de- 
sign may require moving data every few milliseconds to im- 
age the page, until a total of 1 page has been moved. This 
may be (at 300 DPI, for example) (8.5 x 300) < (11 X< 300), 
or 1,051,875 bytes. In current controller designs the width is 
often rounded to a word boundary (usually 320 bytes at 300 
DPI). This technique uses 1,056,000 bytes, or 528,000 
words. 


; Version 1.0 Sun Mar 29 12:57:20 1987 


‘Application Note 526 


National Semiconductor 


Dave Rand 


2.0 DESCRIPTION 


The move string instructions (MOVSi) in the 32000 are very 
powerful, however, when all that is needed is a string copy, 
they may be overkill. The string instructions include string 
translation, conditionals and byte/word/double sizes. If the 
application needs only to move a block of data from one 
location to another, and that data is a known size (or at least 
a multiple of a known size), using unrolled MOVD instruc- 
tions is a faster way of moving the data from A to B on the 
NS32032 and NS32332. 


3.0 IMPLEMENTATION 


A code sample follows which makes use of a block size of 
128 bytes. To move 256 bytes, for example, RO should con- 
tain 2 on entry. 


;A subroutine to move blocks of memory. Uses a granularity of 


:128 bytes. 


; Inputs: 

; rQ = number of 128 byte blocks to move 
: rl = source block address 

; r2 = destination block address 


;Listing continues on following page 
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° Outputs: 


r0 = 0 


rl = source block address + (128 * blocks) 
r2 = destination block address + (128 * blocks) 


;Notes 
: This algorithm corresponds closely to the MOVSD instruction, 
: except that r0 contains the number of 128 byte blocks, not 
; 4 byte double words. The output values are the same as if a 
; MOVSD instruction were used. 
movinem: cmpqd 0,r0 :if no blocks to move 

beq mvexit ;exit now. 

align 4 
mvipl: movd O(r1),0(r2) smove one block of data 

movd 4(ri),4(r2) 

movd 8(r1) ,8(r2) 

movd 12(r1),12(r2) 

movd 16(r1),16(r2) 

movd 20(r1) ,20(r2) 

movd 24(r1) ,24(r2) 

movd 28(r1) ,28(r2) 

movd 32(r1) ,32(r2) 

movd 36(r1) ,36(r2) 

movd 40(r1) ,40(r2) 

movd 44(r1),44(r2) 

movd 48(r1),48(r2) 

movd 52(r1),52(r2) 

movd 56(r1) ,56(r2) 

movd 60(r1) ,60(r2) 

movd 64(r1) ,64(r2) 

movd 68(r1) ,68(r2) 

movd 72(r1),72(r2) 

movd /76(r1),76(r2) 

movd 80(r1),80(r2) 

movd 84(r1) ,84(r2) 

movd 88(r1) ,88(r2) 

movd 92(r1) ,92(r2) 

movd 96(r1) ,96(r2) 

movd 100(r1),100(r2) 

movd 104(r1),104(r2) 

movd 108(r1),108(r2) 

movd 112(r1),112(r2) 

movd 116(r1),116(r2) 

movd 120(r1),120(r2) 

movd 124(r1),124(r2) 

addr =—s_s«*128(r1),r1 ;quick way of adding 128 

addr 128(r2),r2 

achbd -1,r0,mvipl ;loop for rest of blocks 
mvexit: ret $0 
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4.0 TIMING 


All timing assumes word aligned data (double word aligned 
for 32-bit bus). Unaligned data is permitted, but will reduce 
the speed. 


On the 32532 (no wait states, @ 30 MHz, 32-bit bus), this 
code executes in 204 clocks, assuming burst mode access 
is available. To move 256 bytes, this routine would take 
13.6 ws. The MOVSD instruction takes about 156 clocks to 
move a 128-byte block. The MOVSD instruction is the best 
choice, therefore, on the 32532. 


On the 32332 (no wait states, @ 15 MHz, 32-bit bus), this 
code executes in 458 clocks per 128-byte block. Thus, to 
move 256 bytes, this algorithm takes 61.1 ys. The loop 
overhead (the ADDR and ACBBD instructions) is about 10%. 
Doubling the block size (to 256 bytes) would reduce the 
loop overhead to 5%, and reducing the block size (to 64 
bytes) would increase the loop overhead to 20%. In com- 
parison, the 32332 MOVSD instruction takes about 721 
clocks to move a 128-byte block. 


On the 32032 (no wait states. @ 10 MHz, 32-bit bus), this 
code executes in 634 clocks per 128-byte block. Thus, to 


move 256 bytes, this algorithm takes 126.8 ws. The loop 
overhead (the ADDR and ACBD instructions) is about 5%. 
Doubling the block size (to 256 bytes) would reduce the 
loop overhead to 2.5%, and reducing the block size (to 64 
bytes) would increase the loop overhead to 10%. In com- 
parison, the 32032 MOVSD instruction takes about 690 
clocks to move a 128-byte block. 


On the 32016 (1 wait state. @ 10 MHz, 16-bit bus), this code 
executes in 1150 clocks per 128-byte block. Thus, to move 
256 bytes, this algorithm takes 230.0 ws. The loop overhead 
on the 32016 is about 2.5%. In comparison, the 32016 
MOVSD instruction would take about 1,074 clocks. Thus, 
the MOVSD instruction is faster, and makes better use of 
the available bus bandwidth of the NS32016. 


5.0 CONCLUSIONS 

The MOVSi instructions on the NS32016 provide a very fast 
memory block move capability, with variable size. On the 
NS32332 and NS32032, however, unrolled MOVD instruc- 
tions are faster due to the larger bus bandwidth of the 
NS32332 and NS32032. 


Clearing Memory with the 
32000; Series 32000® 
Graphics Note 3 


1.0 INTRODUCTION 


In printer applications, large amounts of RAM may need to 
be initialized to a zero value. This application note describes 
a fast method. 


2.0 DESCRIPTION 


While several different methods of initializing memory to all 
zeros are available, here is one that works very well on the 
Series 32000. While the current version clears memory only 
in blocks of 128 bytes, other block sizes are possible by 
extending the algorithm. 


: Version 1.1 Sun Mar 29 10:22:19 1987 


National Semiconductor 
Application Note 527 
Dave Rand 


3.0 IMPLEMENTATION 


This routine is written to clear blocks of 128 bytes. This 
provides an optimal tradeoff between loop size (granularity) 
and loop overhead. This can be modified to use a different 
size. For example, to use a block size of 64 bytes, simply 
delete 16 of the MOVQD 0,TOS instructions from the listing. 
As well, since the value of r1 is now the number of 64 byte 
groups, one of the ADDD R2,R2 instructions (prior to the 
loading of the stack pointer) must be removed. Since the 
32000 has two stacks, interrupts will be handled properly 
using this code. If only a fixed buffer size needs to be 
cleared, the code can be further unrolled to clear that area 
(i.e., increase the number of MOVQD 0,TOS instructions.) 


‘Subroutine to clear a block of memory. The granularity of this 
salgorithm is 128 bytes, to reduce the looping overhead. 


: Inputs: 
; r0 = start of block 


: rl = number of 128-byte groups to clear 


: Outputs: 
; All registers preserved. 


-Listing continues on following page 


° 
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clram: cmpqd 0,ri 
beq clexit:w 
save [r0,rl,r2] 
movd rl,r2 


addd r2,r2 
addd r2 rz 
addd r2,r2 
addd r2,re2 
addr 4(r0) [r2:q],r0 
sprd sp,r2 
lprd sp,r0 
.align 4 
cl2: movaqd 0,tos 
movqd 0,tos 
movqd 0,tos 
movqd 0,tos 


movqd 0,tos 
movqd 0,tos 


movqd 0,tos 
movqd 0,tos 
movad 0,tos 
movqd 0Q,tos 
movqd 0,tos 
movad 0,tos 
movqd 0,tos 
movqd 0,tos 
movqd 0,tos 
movad 0,tos 
movqd 0,tos 
movad 0,tos 
movqd 0,tos 
movqd 0,tos 
movqd 0,tos 
movqd 0,tos 
movqd 0,tos 
movqd 0,tos 
movad 0,tos 


movad 0,tos 
movqd 0Q,tos 
movqd 0,tos 


movqd 0,tos 
movqd 0,tos 
movad 0,tos 
movqd 0,tos 
acbd -l,rl,cl2 
lprd sp,r2 
restore [r0,rl,r2] 
clexit: ret 0 


zany blocks to clear? 
;no, exit now. 
:save our working registers 


-here we set rO = r0 + (ri * 128) + 4 


slength *= 2 

54 

;*8 

:*16 

‘get starting point + 4 
;save current stack 
‘move to last double 


‘clear a double 


;restore stack pointer 


;restore our saved registers 


FIGURE 1 
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clram: cmpaqd 
beq 
.align 

cl2: movad 
movad 
movad 
movad 
movad 
movad 
movad 
movad 
movad 
movad 
movqd 
movqd 
movad 
movad 
movaqd 
movad 
movad 
movqd 
movaqd 
movad 
movqd 
movad 
movad 
movad 
movad 
movad 
movad 
movqd 
movad 
movad 
movad 
movqd 
addd 
achd 

clexit: ret 


0,rl 
clexit:w 
4 
0,00(r0) 
0,04(r0) 
0,08(r0) 
0,12(r0) 
0,16(r0) 
0,20(r0) 
0,24(r0) 
0,28(r0) 
0,32(r0) 
0,36(r0) 
0,40(r0) 
0,44(r0) 
0,48(r0) 
0,52(r0) 
0,56(r0) 
0,60(r0) 
0,64(r0) 
0,68(r0) 
0,72(r0) 
0,76(r0) 
0,80(r0) 
0,84(r0) 
0,88(r0) 
0,92(r0) 
0,96(r0) 
0,100(r0) 
0,104(r0) 
0,108(r0) 
0,112(r0) 
0,116{r0) 
0,120(r0) 
0,124(r0) 
$128,r0 
-1,ri,cl2 
0 


sany blocks to clear? 
sno, exit now. 


‘clear a double 


FIGURE 2 
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4.0 TIMING RESULTS 


On the NS32016, NS32032 and NS32332, 4 clock cycles 
per write are required. To clear one page of 300 DPI 
814 x 11 (1,056,000 bytes), for example, requires 264,000 
double words to be written. The optimal time for this, using 
100% of the bus bandwidth on a 16 bit bus, would be 
528,000 * 400 ns, cr 211.2 ms, @ 10 MHz. Ali timing data 
assumes word aligned data (double word aligned for 32 bit 
bus). Unaligned data is permitted, but will reduce the speed 
somewhat. 


On the NS32332 (no wait states. @15 MHz, 32 bit bus), this 
code clears the full page image in 178 ms. 


On the NS32032 (no wait states. @10 MHz, 32 bit bus), this 
code clears the full page image in 324 ms. 


On the NS32016 (1 wait state. @10 MHz, 16 bit bus), this 
code clears the full page image in 509 ms. 


Doubling the block size (to 256 bytes) would increase the 
speed by 1%-—2%, on the code sample. 


On the NS32532, a better approach is to use the register 
indirect method of referencing memory, as is shown in Fig- 
ure 2. With this approach, the page memory can be cleared 
in 19 ms, assuming a no wait state 30 MHz system, with a 
32 bit bus. The optimal time, using 100% of the bus band- 
width of the NS32532 (2 clock bus cycle) would be 264,000 
* 66.6 ns, or 17.6 ms. 
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Image Rotation Algorithm 
Series 32000® Graphics 
Note 4 


1.0 INTRODUCTION 


Fast image rotation of 90 and 270 degrees is important in 
printer applications, since both Portrait and Landscape ori- 
entation printing may be desired. With a fast image rotation 
algorithm, only the Portrait orientation fonts need to be 
stored. This minimizes ROM storage requirements. 


This application note shows a fast image rotation algorithm 
that may be used to rotate an 8 pixel by 8 line image. Larger 
image sizes may be rotated by successive application of the 
rotation primitive. 


2.0 DESCRIPTION 


This Rotate Image algorithm (developed by the Electronic 
Imaging Group at National Semiconductor) does a very fast 
8 by 8 (64 bit) rotation of font data. Note also that this algo- 
rithm does not exclusively deal with fonts, but any 64 bit 
image. Larger images can be rotated by breaking the image 
down into 8 x 8 segments, and using a ‘source warp’ con- 
stant to index into the source data. 


The source data is pointed to by RO on entry. A ‘source 
warp’ is contained in R1, and is added to RO after each read 
of the source font. This allows the rotation of 16 by 16, 32 
by 32 and larger fonts. 

ROTIMG deals with the 8 by 8 destination character as 8 
sequential bytes in two registers (R2 and R93), as follows: 
Destination Font Matrix 


Low Address 


2 

3 

4 R214 3 2 1 
S =R3/8 7 6 5 
} 
7 

8 


High Address 


ROTIMG uses an external table (a pointer to the start of the 
table is located in register R4) to speed the rotation and to 
minimize the code. This table consists of 256 64 bit entries, 
or a total of 2,048 bytes. The table may be located code 
(PC) or data (SB) relative. The complete table is at the end 
of this document (see Figure 7). A few entries of the table 
are reproduced above. 
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Entry Definition 

0 0x00000000 00000000 
1 0x00000000 00000001 
2 0x00000000 00000100 
3 0x00000000 000001071 
253 0x01010101 01010001 
254 0x01010101 01010100 
255 0x01010101 01010101 


The bytes in the table are standard LSB to MSB format. 
Since there is no quad-byte assembler pseudo-op (other 
than LONG, which is floating point), we must reverse the 
‘double’ declaration to get the correct byte ordering, as is 
shown below: 


Entry Definition 

0 double 0,0 

1 double 1,0 

2 double 256,0 

3 double 257,0 

253 double 16842753,16843009 

254 double 0x01010100,0x01010101 
255 double 0x01010101,0x01010101 


Each byte within each eight byte table entry represents one 
bit of output data. By indexing into the table, and ORing the 
table’s contents with R2 and R93, we set the destination byte 
if the corresponding source bit is set. In this manner, the 
character is rotated. 


3.0 IMPLEMENTATION 


What we are doing is setting the LS Bit of the destination 
byte if the source bit corresponding to that byte is set. We 
then shift the entire 64 bit destination left one bit, and repeat 
this process until we have set all eight bits, and processed 
all eight bytes of source information. 


The source data for an 8 by 8 character “>” appears be- 
low: 


Character Table for ‘>’ 


Bit Number — Hex Value 
01234567 
Byte 001000000 02 
100100000 04 
200010000 08 
300001000 10 
400001000 10 
500010000 08 
600100000 04 
701000000 02 
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#Rotate image emulation code 


# 

# Inputs: 

# RO = Source font address 

# Rl = Source font warp 

# R4 = Rotate table address 

# 

# Outputs: 

# R2 = Destination font low 4 bytes (isb->msb, 0 - 3) 

# R3 = Destination font high 4 bytes (1sb->msb, 4 - 7) 

# 

ROTIMG: save [r0,r5,r6,r7] #save registers we will use 
movaqd 0,r2 #clear destination font 
movd r2,r3 #clear high bits of dest. 
movd r2,r5 #clear high bits of temp. 
addr 8,r6 #deal with 8 bytes of src. 

rotlp: movb O(r0),r5 #get a byte of source 
addd r1,r0 #add source warp 
addd r2,r2 #shift destination left one bit 
addd r3 73 #top 32 bits too 
addrd r4[r5:q] ,r7 #get pointer to table 
ord 0(r7),r2 #or in low bits 
ord 4(r7),r3 #or in high bits 
acbd -1,r6,rotlp #and back for more 
restore [r0,r5,r6,r7] #restore registers 
ret $0 #and return 


TL/EE/9698-1 
Now, let’s look at what happens to the data, given the example font of ‘>’. 


Loop # Source Font R3 R2 

0 — 00000000 00000000 ;0 destination 

1 02 hex 00000000 00000100 ‘first bits in 

2 04 00000000 00010200 ;next bits in 

3 08 00000000 01020400 ;and so on 

4 10 00000001 02040800 

5 10 00000003 04081000 

6 08 00000006 09102000 

7 04 0000000C 12214000 

8 02 00000018 24428100 slast iteration 
Now, arranging this in the appropriate order gives us: 

Destination Character Table for ‘>’, 90 degree Destination Character Table for ‘>’, 270 degree 
Bit Number Hex Value Bit Number Hex Value 
01234567 01234567 

Byte 000000000 00 Byte 000000000 00 
110000001 81 100000000 00 
201000010 42 200000000 00 
300100100 24 300011000 18 
400011000 18 400100100 24 
500000000 | 00 501000010 42 
600000000 00 610000001 81 
700000000 00 700000000 00 


Note that by re-ordering the output data, we may rotate 90 or 270 degrees. This may also be accomplished by using a different 
table (see Figure 2). 
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4.0 TIMING 
With unrolled 32000 code, the time for this algorithm is about 588 clocks on the 32016. Subtracting the font read time from this 
(about 113 clocks), the actual time for rotation is 475 clocks. On the 32332, the time is about 388 clocks. On the 32532, the 
unrolled loop time is 120-180 clocks, depending on burst mode availability. Repetition of the character data also affects the 
32532, due to the data cache. See Figure 3 for an unrolled code listing. 


This table is used for the ROTIMG code. It is 256 entries of 64 bits each (8 bytes * 256 = 2048 bytes). There are two entries per 
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line. This table is used for 90° rotation. 


rottab1: 


double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
.double 
double 
.double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 


0x00000000,0200000000,0x00000001,0x00000000 ;0,1 
0x00000100,0x00000000,0x00000101,0x00000000 ;2,3 
0x00010000,0x00000000,0x00010001,0x00000000 ;4,5 
0x00010100,0x00000000,0x00010101,0x00000000 ;6,7 
0x01000000,0x00000000,0x01000001,0x00000000 ;... 
0x01000100,0x00000000,0x01000101,0x00000000 
0x01010000,0x00000000,0x01010001,0x00000000 
0x01010100,0x00000000,0x01010101,0x00000000 
0x00000000,0%00000001,0x00000001,0x00000001 
0x00000100,0x00000001,0x00000101,0x00000001 
0x00010000,0x00000001,0x00010001,0x00000001 
0x00010100,0x00000001,0x00010101,0x00000001 
0%x01000000,0x00000001,0:01000001,0x00000001 
0x01000100,0x00000001,0x01000101,0x00000001 
0x01010000,0x00000001,0x01010001,0x00000001 
0x01010100,0x00000001,0x01010101,0x00000001 
0x00000000,0x00000100,0x00000001,0x00000100 
0x00000100,0x00000100,0x00000101,0x00000100 
0x00010000,0x00000100,0x00010001,0x00000100 
0x00010100,0x00000100,0x00010101,0x00000100 
0x01000000,0x00000100,0x01000001,0x00000100 
0x01000100,0x00000100,0x01000101,0x00000100 
0x01010000,0x00000100,0x01010001,0x00000100 
0x01010100,0x00000100,0x01010101,0x00000100 
0700000000,0x00000101,0x00000001,0x00000101 
0x00000100,0x00000101,0x00000101,0x00000101 
0x00010000,0x00000101,0x00010001,0x00000101 
0x00010100,0x00000101,0x00010101,0x00000101 
0x01000000,0x00000101,0x01000001,0x00000101 
0x01000100,0x00000101,0x01000101,0x00000101 
0x01010000,0x00000101,0x01010001,0x00000101 
0x01010100,0x00000101,0x01010101,0x00000101 
0x00000000,0x00010000,0x00000001,0x00010000 
0x00000100,0x00010000,0x00000101,0x00010000 
0x00010000,0%00010000,0x00010001,0x00010000 
0x00010100,0x00010000,0x00010101,0x00010000 
0x01000000,0x00010000,0x01000001,0x00010000 
0x01000100,0x00010000,0x01000101,0x00010000 
0x01010000,0x00010000,0x01010001,0x00010000 
0x01010100,0x00010000,0x01010101,0x00010000 
0x00000000,0x00010001,0x00000001,0x00010001 
0x00000100,0x00010001,0x00000101,0x00010001 
0x00010000,0x00010001,0x00010001,0x00010001 
0x00010100,0x00010001,0x00010101,0x00010001 
0x01000000,0%00010001,0x01000001,0x00010001 
0x01000100,0x00010001,0x01000101,0x00010001 
0x01010000,0x00010001,0x01010001,0x00010001 
0x01010100,0x00010001,0x01010101,0x00010001 
0x00000000,0x00010100,0x00000001,0x00010100 


FIGURE 1 
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.double 
.double 
double 


double 


double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
double 
.double 
.double 
double 
.double 
.double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 


double 


double 
double 
double 
double 
double 
double 
double 
double 
double 
double 


0x00000100,0x00010100,0x00000101,0x00010100 
0x00010000,0x00010100,0x00010001,0x00010100 
0x00010100,0x00010100,0x00010101,0x00010100 
0x01000000,0x00010100,0x01000001,0x00010100 
0x01000100,0x00010100,0x01000101,0x00010100 
0x01010000,0x00010100,0x01010001,0x00010100 
0x01010100,0x00010100,0x01010101,0x00010100 
0x00000000,0x00010101,0x00000001,0x00010101 
0x00000100,0x00010101,0x00000101,0x00010101 
0x00010000,0x00010101,0x00010001,0x00010101 
0x00010100,0x00010101,0x00010101,0x00010101 
0x01000000,0x00010101,0x01000001,0x00010101 
0x01000100,0x00010101,0x01000101,0x00010101 
0x01010000,0x00010101,0x01010001,0x00010101 
0x01010100,0x00010101,0x01010101,0x00010101 
0x00000000,0x01000000,0x00000001,0x01000000 
0x00000100,0x01000000,0x00000101,0x01000000 
0x00010000,0x01000000,0x00010001,0x01000000 
0x00010100,0x01000000,0x00010101,0x01000000 
0x01000000,0x01000000,0x01000001,0x01000000 
0x01000100,0x01000000,0x01000101,0x01000000 
0x01010000,0x01000000,0x01010001,0x01000000 
0x01010100,0x01000000,0x01010101,0x01000000 
0x00000000,0x01000001,0x00000001,0x01000001 
0x000001.00,0x01000001,0x00000101,0x01000001 
0x00010000,0x01000001,0x00010001,0x01000001 
0x00010100,0x01000001,0x00010101,0x01000001 
0x01000000,0x01000001,0x01000001,0x01000001 
0x01000100,0x01000001,0x01000101,0x01000001 
0x01010000,0x01000001,0x01010001,0x01000001 
0x01010100,0x01000001,0x01010101,0x01000001 
0x00000000,0x01000100,0x00000001,0x01000100 
0x00000100,0x01000100,0x00000101,0x01000100 
0x00010000,0x01000100,0x00010001,0x01000100 
0x00010100,0x01000100,0x00010101,0x01000100 
0x01000000,0x01000100,0x01000001,0x01000100 
0x01000100,0x01000100,0x01000101,0x01000100 
0x01010000,0x01000100,0x01010001,0x01000100 
0x01010100,0x01000100,0x01010101,0x01000100 
0x00000000,0x01000101,0x00000001,0x01000101 
0x00000100,0x01000101,0x00000101,0x01000101 
0x00010000,0x01000101,0x00010001,0x01000101 
0x00010100,0x01000101,0x00010101,0x01000101 
0x01000000,0x01000101,0x01000001,0x01000101 
0x01000100,0x01000101,0x01000101,0x01000101 
0x01010000,0x01000101,0x01010001,0x01000101 
0x01010100,0x01000101,0x01010101,0x01000101 
0x00000000,0x01010000,0x00000001,0x01010000 
0x00000100,0x01010000,0x00000101,0x01010000 
0x00010000,0x01010000,0x00010001,0x01010000 
0x00010100,0x01010000,0x00010101,0x01010000 
0x01000000,0x01010000,0x01000001,0x01010000 
0x01000100,0x01010000,0x01000101,0x01010000 
0x01010000,0x01010000,0x01010001,0x01010000 
0x01010100,0x01010000,0x01010101,0x01010000 


FIGURE 1 (Continued) 


148 


TL/EE/9698-3 


double 
.double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
.double 
double 
double 
.double 
double 
double 
.double 
double 
.double 
double 


0x00000000,0x01010001,0x00000001,0x01010001 
0x00000100,0x01010001,0x00000101,0x01010001 
0x00010000,0x01010001,0x00010001,0x01010001 
0x00010100,0x01010001,0x00010101,0x01010001 
0x01000000,0x01010001,0x01000001,0x01010001 
0x01000100,0x01010001,0x01000101,0x01010001 
0x01010000,0x01010001,0x01010001,0x01010001 
0x01010100,0x01010001,0x01010101,0x01010001 
0x00000000,0x01010100,0x00000001,0x01010100 
0x00000100,0x01010100,0x00000101,0x01010100 
0x00010000,0x01010100,0x00010001,0x01010100 
0x00010100,0x01010100,0x00010101,0x01010100 
0x01000000,0x01010100,0x01000001,0x01010100 
0x01000100,0x01010100,0x01000101,0x01010100 
0x01010000,0x01010100,0x01010001,0x01010100 
0x01010100,0x01010100,0x01010101,0x01010100 
0x00000000,0x01010101,0x00000001,0x01010101 
0x00000100,0x01010101,0x00000101,0x01010101 
0x00010000,0x01010101,0x00010001,0x01010101 
0x00010100,0x01010101,0x00010101,0x01010101 
0x01000000,0x01010101,0x01000001,0x01010101 
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double 0x01000100,0x01010101,0x01000101,0x01010101 ;250,251 
double 0x01010000,0x01010101,0x01010001,0x01010101 ;252,253 


double —0x01010100,0x01010101,0x01010101,0x01010101 ;254,255 
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This table is used for the ROTIMG code. It is 256 entries of 64 bits each (8 bytes * 256 = 2048 bytes). There are two entries per 
line. This gives a 270° rotation. 


rottab2: 


double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
.double 


0x00000000,0x00000000,0%00000000,0%01000000 
0x00000000,0x00010000,0x00000000,0x01010000 
0x00000000,0:00000100,0x00000000,0x01000100 
0x00000000,0x00010100,0x00000000,0x01010100 
0x00000000,0x00000001,0x00000000,0x01000001 
0x00000000,0x00010001,0x00000000,0x01010001 
0x00000000,0x00000101,0x00000000,0x01000101 
0x00000000,0x00010101,0x00000000,0x01010101 
0x01000000,0x00000000,0201000000,0x01000000 
0x01000000,0x00010000,0x01000000,0x01010000 
0x01000000,0x00000100,0x01000000,0x01000100 
0x01.000000,0x00010100,0x01000000,0x01010100 
0x01000000,0x00000001,0%01000000,0x01000001 
0x01000000,0x00010001,0x01000000,0x01010001 
0xx01000000,0x00000101,0x01000000,0x01000101 
0x01000000,0x00010101,0x01000000,0x01010101 
0x00010000,000000000,0%00010000,0x01000000 
000010000,0:00010000,0x00010000,0x01010000 
0x00010000,0x00000100,0x00010000,0x01000100 
0x00010000,0x00010100,0x00010000,0x01010100 
0x00010000,0x00000001,0%00010000,0x01000001 
0x00010000,0x00010001,0x00010000,0x01010001 
0x00010000,0x00000101,0x00010000,0x01000101 
0x00010000,0x00010101,0x00010000,0x01010101 
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double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 


double . 


0x01010000,0x00000000,0x01010000,0x01000000 
0x01010000,0x00010000,0x01010000,0x01010000 
0%x01010000,0x00000100,0x01010000,0x01000100 
0x01010000,0x00010100,0x01010000,0x01010100 
0x01010000,0x00000001,0x01010000,0x01000001 
0x01010000,0x00010001,0x01010000,0x01010001 
0x01010000,0x00000101,0x01010000,0x01000101 
0x01010000,0x00010101,0x01010000,0x01010101 
0x00000100,0x00000000,0x00000100,0x01000000 
0x00000100,0x00010000,0x00000100,0x01010000 
0x00000100,0x00000100,0x00000100,0x01000100 
0x00000100,0x00010100,0x00000100,0x01010100 
0%x00000100,0x00000001,0x00000100,0x01000001 
0x00000100,0x00010001,0x00000100,0x01010001 
0x00000100,0x00000101,0x00000100,0x01000101 
0x00000100,0x00010101,0x00000100,0x01010101 
0x01000100,0x00000000,0x01000100,0x01000000 
0x01000100,0x00010000,0x01000100,0x01010000 
0x01000100,0x00000100,0x01000100,0x01000100 
0x01000100,0x00010100,0x01000100,0x01010100 
0x01000100,0x00000001,0x01000100,0x01000001 
0x01000100,0x00010001,0x01000100,0x01010001 
0x01000100,0x00000101,0x01000100,0x01000101 
0x01000100,0x00010101,0x01000100,0x01010101 
0x00010100,0x00000000,0x00010100,0x01000000 
0x00010100,0x00010000,0x00010100,0x01010000 
0x00010100,0x00000100,0x00010100,0x01000100 
0x00010100,0x00010100,0x00010100,0x01010100 
0x00010100,0x00000001,0x00010100,0x01000001 
0x00010100,0x00010001,0x00010100,0x01010001 
0x00010100,0x00000101,0x00010100,0x01000101 
0x00010100,0x00010101,0x00010100,0x01010101 
0x01010100,0x00000000,0x01010100,0x01000000 
0x01010100,0x00010000,0x01010100,0x01010000 
0x01010100,0x00000100,0:01010100,0x01000100 
0x01010100,0x00010100,0x01010100,0x01010100 
0x01010100,0x00000001,0x01010100,0x01000001 
0x01010100,0x00010001,0x01010100,0x01010001 
0x01010100,0x00000101,0x01010100,0x01000101 
0x01010100,0x00010101,0x01010100,0x01010101 
0x00000001,0x00000000,0x00000001,0x01000000 
0x00000001,0x00010000,0x00000001,0x01010000 
000000001, 0x00000100,0x00000001,0x01000100 
0x00000001,0x00010100,0x00000001,0x01010100 
0x00000001,0x00000001,0x00000001,0x01000001 
0x00000001,0x00010001,0x00000001,0x01010001 
0x00000001,0x00000101,0x00000001,0x01000101 
000000001 ,0x00010101,0x00000001,0x01010101 
0x01000001,0x00000000,0:01000001,0x01000000 
0x01000001,0x00010000,0%01000001,0x01610000 
0x01000001,0x00000100,0x01000001,0x01000100 
0x01000001,0x00010100,0x01000001,0x01010100 
0x01000001,0x00000001,0x01000001,0x01000001 
0x01000001,0x00010001,0x01000001,0x01010001 
0x01000001,0x00000101,0x01000001,0x01000101 
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.double 
double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
double 
double 
double 
double 
.double 
double 
double 
.double 
double 
.double 
double 
.double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 
double 


0x01000001,0x00010101,0x01000001,0x01010101 
0x00010001,0x00000000,0x00010001,0x01000000 
0x00010001,0x00010000,0x00010001,0x01010000 
0x00010001,0x00000100,0x00010001,0%01000100 
0x00010001,0x00010100,0x00010001,0x01010100 
0x00010001,0x00000001,0x00010001,0x01000001 
0x00010001,0x00010001,0x00010001,0x01010001 
0x00010001,0x00000101,0x00010001,0x01000101 
0x00010001,0x00010101,0x0010001,0x01010101 
0x01010001,0x00000000,0x01010001,0x01000000 
0x01010001,0x00010000,0x01016001,0x01010000 
0x01010001,0x00000100,0x01010001,0x01000100 
0x01010001,0x00010100,0x01010001,0x01010100 
0x01010001,0x00000001,0x01010001,0x01000001 
0x01010001,0x00010001,0x01010001,0x01010001 
0x01010001,0x00000101,0x01010001,0x01000101 
0x01010001,0x00010101,0x01010001,0x01010101 
0x00000101,0x00000000,0x00000101,0x01000000 
0x00000101,0x00010000,0x00000101,0x01010000 
0x00000101,0x00000100,0x00000101,0x01000100 
0x00000101,0x00010100,0x00000101,0x01010100 
0x00000101,0x00000001,0x00000101,0x01000001 
0x00000101,0x00010001,0x00000101,0x01010001 
0x00000101,0x00000101,0x00000101,0x01000101 
0x00000101,0x00010101,0x00000101,0x01010101 
0x01000101,0x00000000,0x01000101,0x01000000 
0x01000101,0x00010000,0x01000101,0x01010000 
0x01000101,0x00000100,0x01000101,0x01000100 
0x01000101,0x00010100,0x01000101,0x01010100 
0x01000101,0x00000001,0x01000101,0x01000001 
0x01000101,0x00010001,0x01000101,0x01010001 
0x01000101,0x00000101,0x01000101,0x01000101 
0x01000101,0x00010101,0x01000101,0x01010101 
0x00010101,0x00000000,0x00010101,0x01000000 
0x00010101,0x00010000,0x00010101,0x01010000 
0x00010101,0x00000100,0x00010101,0x01000100 
0x00010101,0x00010100,0x00010101,0x01010100 
0x00010101,0x00000001,0x00010101,0x01000001 
0x00010101,0x00010001,0x00010101,0x01010001 
0x00010101,0x00000101,0x00010101,0x01000101 
0x00010101,0x00010101,0x00010101,0x01010101 
0x01010101,0x00000000,0x01010101,0x01000000 
0x01010101,0x00010000,0x01010101,0x01010000 
0x01010101,0x00000100,0x01010101,0x01000100 
0x01010101,0x00010100,0x01010101,0x01010100 
0x01010101,0x00000001,0x01010101,0x01000001 
0x01010101,0x00010001,0x01010101,0x01010001 
0x01010101,0x00000101,0x01010101,0x01000101 
0x01010101,0x00010101,0x01010101,0x01010101 
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The following is an unrolled version of the rotate image algorithm. For the NS32532, the address computation, currently 


done with a separate addr instruction, may be done with the ORD instruction. This makes the execution time slightly faster. 


# 
# 
#Rotate image emulation code 
# 
# Inputs: 
# RO = Source font address 
# Rl = Source font warp 
# R4 = Rotate table address 
# 
# Outputs: 
# R2 = Destination font low 4 bytes (1sb->msb, 0 - 3) 
# R3 = Destination font high 4 bytes (Isb->msb, 4 - 7) 
# 
ROTIMG: 
movqd 0,r2 #clear destination font 
movd r2,r3 #clear high bits of dest. 
movd r2,r5 ¥#clear high bits of temp. 
movb O(r0),r5 #get a byte of source 
addd r1,r0 #add source warp 
addd r2,r2 #shift destination left one bit 
addd r3,7r3 #top 32 bits too 
addr r4(r5:q],r6  #get pointer to table 
ord O(r6),r2 #or in low bits 
ord 4(r6),r3 #or in high bits 
movb O(r0),r5 #get a byte of source 
addd rl,r0 #add source warp 
addd r2,.r2 #shift destination left one bit 
addd r3 73 #top 32 bits too 
addr r4(r5:q],r6 #get pointer to table 
ord 0(r6),r2 #or in low bits 
ord 4(r6),r3 #or in high bits 
movb 0(r0),r5 #get a byte of source 
addd r1,r0 #add source warp 
addd r2,r2 #shift destination left one bit 
addd r3,r3 _ #top 32 bits too 
addr r4(r5:q].r6 #get pointer to table 
ord O(r6),r2 for in low bits 
ord 4(r6),r3 #or in high bits 
movb 0(r0),r5 #get a byte of source 
addd r1,r0 #add source warp 
addd r2,re wsnift destination left one pit 
addd r3,r3 #top 32 bits too 
addr r4[r5:q],r6 #get pointer to table 
ord O(r6),r2 #or in low bits 
ord 4(r6),r3 #or in high bits 
movb 0(r0),r5 #get a byte of source 
addd r1,r0 #add source warp 
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addd 
addd 
addr 
ord 
ord 
movb 
addd 
addd 
addd 
addr 
ord 
ord 
movb 
addd 
addd 
addd 
addr 
ord 
ord 
movb 
addd 
addd 
addd 
addr 
ord 
ord 
ret 


r2,r2 
r3,r3 


r4(r5:q] ,r6 


O(r6),r2 
4(r6),r3 
0(r0),r5 
r1,r0 
r2,r2 
r3,73 


r4[r5:q] ,r6 


0(r6),r2 
4(r6),r3 
0(r0),r5 
r1,r0 
r2,r2 
73,73 


r4(r5:q],r6 


0(r6),r2 
4(r6),r3 
0(r0),r5 
ri,r0d 
r2,r2 
r3,r3 


r4[r5:q],r6 


0(r6),r2 
4(r6),r3 
$0 


#shift destination left one bit 
#top 32 bits too 

#get pointer to table 

for in low bits 

#or in high bits 

#get a byte of source 

#add source warp 

#shift destination left one bit 
#top 32 bits too 

#get pointer to table 

#or in low bits 

for in high bits 

#get a byte of source 

#add source warp 

#shift destination left one bit 
#top 32 bits too 

#get pointer to table 

for in low bits 

#¥or in high bits 

#get a byte of source 

#add source warp 

#shift destination left one bit 
#top 32 bits too 

#get pointer to table 

#or in low bits 

#or in high bits 

#and return 
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80x86 to Series 32000® 


Translation; Series 32000 
Graphics Note 6 


1.0 INTRODUCTION 


This application note discusses the conversion of Intel 
8088, 8086, 80188 and 80186 (referred to here as 80x86) 
source assembly language to Series 32000 source code. As 
this is not intended to be a tutorial on Series 32000 assem- 
bly language, please see the Series 32000 Programmers 
Reference Manual for more information on instructions and 
addressing modes. 


2.0 DESCRIPTION 


The 80x86 model has 6 general purpose registers (AX, BX, 
CX, DX, Sl, Dl), each 16 bits wide. 4 of these registers can 
be further addressed as 8-bit registers (AL, AH, BL, BH, CL, 
CH, DL, DH). Series 32000 has 8 general purpose registers 
(RO-—R7), each 32 bits wide. Each Series 32000 register 
may be accessed as an 8-, 16- or 32-bit register. Two spe- 
cial purpose registers on the 80x86, SP and BP, are 16-bit 
stack and base pointers. These are represented in Series 
32000 with the SP and FP registers, each 32-bit. 


The 80x86 model is capable of addressing up to 1 Mega- 
byte of memory. Since the 16-bit register pointers are only 
capable of addressing 64 kbytes, 4 segment registers (CS, 
DS, ES, SS) are used in combination with the basic registers 
to point to memory. Series 32000 registers and addressing 
modes are ail full 32-bit, and may point anywhere in the 
16 Megabyte (or 4 Gigabyte, depending on processor mod- 
el) addressing range. 


National Semiconductor 
Application Note 529 
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Device ports are given their own 16-bit address on the 
80x86, and there is a complement of instructions to handle 
input and output to these ports. Device ports on Series 
32000 are memory mapped, and all instructions are avail- 
able for port manipulation. 


There are 6 addressing modes for data memory on the 
80x86: Immediate, Direct, Direct indexed, Implied, Base rel- 
ative and Stack. There are 9 addressing modes on Series 
32000: Register, Immediate, Absolute, Register-relative, 
Memory space, External, Top-of-stack and Scaled index. 
Scaled index may be applied to any of the addressing 
modes (except scaled index) to create more addressing 
modes. The following figure shows the 80x86 addressing 
modes, and their Series 32000 counterparts. 


Series 32000 assembly code reads left-to-right, meaning 
source is on the left, destination on the right. As you can 
see, most of the 80x86 addressing modes fall into the regis- 
ter-relative class of Series 32000. Also note that the ADDW 
could have been ADDD, performing a 32-bit add instead of 
only a 16-bit. 

Series 32000 also permits memory-to-memory (two ad- 
dress) operation. A common operation like adding two vari- 
ables is easier in Series 32000. Series 32000 has the same 
form for all math operations (multiply, divide, subtract), as 
well as all logical operators. 


80x86 Series 32000 

ADD AX, 1234 Immediate ADDW $1234,RO 
ADD AX,LAB1 Direct ADDW LAB1,RO 
ADD AX, 16[SI] Direct Indexed ADDW 16(R6),RO 
ADD Ax, [SI] Implied ADDW 0(R6),RO 
ADD AX, [BX] Base Relative ADDW 0(R1),RO 
ADD AX,[BX+ SI] Base Relative Implied ADDW R1[R6:B],RO 
ADD AX,12[BX+ Sl] Base Relative Implied Indexed ADDW 12(R1)[R6:B],RO 
ADD AX,4[BP] Stack (Relative) ADDW 4(FP),RO 
PUSH AX Stack MOVW RO,TOS 
80x86 Series 32000 

MOV AL,LAB1 ADDB LAB1,LAB2 8-Bit Add Operation 
ADD LAB2,AL 

MOV AX,LAB3 ADDW LAB3,LAB4 16-Bit Add Operation 
ADD LAB4,AX 

MOV AX,LAB5L ADDD LAB5,LAB6 32-Bit Add Operation 
ADD LAB6L,AX 

MOV AX,LAB5H 

ADDC LAB6H,AX 
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Most 80x86 instructions have direct Series 32000 equiva- 
lents—with a major difference. Most 80x86 instructions af- 
fect the flags. Most Series 32000 instructions do not affect 
the flags in the same manner. For example, the 80x86 ADD 
instruction affects the Overflow, Carry, Arithmetic, Zero, 
Sign and Parity flags. The Series 32000 ADD instruction af- 
fects the Overflow and Carry flags. Programs that rely on 
side-effects of instructions which set flags must be changed 
in order to work correctly on Series 32000. 


Table | gives a general guideline of instruction correlation 
between 80x86 and Series 32000. Many of the common 


;This program reads count bytes 


subroutines in 80x86 may be replaced by a single instruction 
in Series 32000 (for example, 32-bit multiply and divide rou- 
tines). Many special purpose instructions exist in Series 
32000, and these instructions may help to optimize various 
algorithms. 


3.0 IMPLEMENTATION 


As an example, we will show some small 80x86 programs 
which we wish to convert to Series 32000. The first program 
reads a number of bytes from a port, waiting for status infor- 
mation. Below is the program in 80x86 assembly language: 
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from port ioport, waiting for bit 7 of 


;Statport to be active (1) before reading each byte. 


xor bx, bx 

mov cx, count 

mov es ,bufseg 

lea di ,buffer 
ll: mov dx ,statport 
12: in al ,dx 

rc] al,l 

jne 12 

mov dx, joport 

in al ,dx 

stosb 

xor ah,ah 

add bx , ax 

loop 1 

ret 


;zero checksum 

;get count of bytes 

;get buffer segment 
spoint to buffer offset 
;get status port address 
;read status port 

smove bit 7 to carry 
;loop until) status available 
;point to data port 
;read port 

:store byte 

szero high part of ax 
sadd to checksum 

:loop for all bytes 
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A direct translation of this program to Series 32000 using Table |, appears below. Note that this program will not work directly, 


due to the side effect of the rci instruction being used. 


#This program reads count bytes 


from port ioport, waiting for bit 7 of 


#statport to be active (1) before reading each byte. 


# 
# Before optimization 


xord rl,rl 

movw $count , r2 

addr buffer ,r5 
V1: addr statport,r3 
12: movb 0(r3), rd 

rotb $1,r0 

bcc 2 

addr ioport ,r3 

movb 0(r3),r0 

movb r0,0(r5) 


addad 1,r5 
movzbw r0,r0 
addw r0,rl 
acbw -1,r2,111 
ret $0 


# zero checksum 

# get count of bytes 

# point to buffer 

# get status port address 

# read status port 

# move bit 7 to carry <<- does not work 
# branch if carry clear 

# point to data port 

# read port 

# store byte 


# zero high part of ax 
# add to checksum 
# loop for all bytes 
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By using some of the special Series 32000 instructions, we 
can make this program much faster. The ROTB wil not work 
to test status, so we will replace that with a TBITB instruc- 


tion. Since TBITB can directly address the port, there is no 
need to read the status port value at all. We will remove the 
read status port line, and the register load of r3. Reading 


#This program reads count bytes 


the IO port as well can be done directly now, and we use a 
zero extension to ensure the high bits are cleared in prepa- 
ration for the checksum addition. Note that it is easy to doa 
32-bit checksum instead of only a 16-bit. Below is the ‘opti- 
mized’ code: 


from port ioport, waiting for bit 7 of 


#statport to be active (1) before reading each byte. 


# 
# After optimization 


xord ri,rl 
movw  $count,r2 
addr buffer,r5 

TV: 

112: tbitb $7,statport 
bfc 112 
movzbd ioport,r0 
movb r0,0(r5) 
addqd 1,r5 
addw r0,rl 
acbw -1,r2,111 
ret $0 


A second program shows, in 80x86 assembler, a method to 
ASCII. This program is shown below: 


# zero checksum 
# get count of bytes 
# point to buffer 


# is bit 7 of status port valid? 
# no, loop until it is 

# read io port 

# store in buffer 


# add to checksum 
# loop for all bytes 


TL/EE/9699-3 
copy and convert a string from mixed case ASCII to all upper case 


;This program translates a nul] terminated ASCII string to uppercase 


mov ds ,buflseg 
lea si, bufl 
mov es ,buf2seg 
lea di ,buf2 
cld 

ll: lodsb 
cmp al, ‘a’ 
jb 12 
cmp al, z° 
ja 12 
and al ,5fh 

12: stosb 
or al,al 
jnz Bl 
ret 


-point to input segment 

spoint ta input string 

;point to output segment 
;point to output string 

sclear direction flag (increasing add) 
;get a byte 

sis the char less than ‘a‘? 
syes, branch out 

sis the char greater than ‘z‘? 
syes, branch out 

sand with 5f to make uppercase 
-store the character 

-is this the last char? 

;no, loop for more 


syes, exit 
TL/EE/9699-4 
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A direct translation to Series 32000 works fine, as is shown below: 


#This program translates a nul] 
# 


# Before optimization 


addr bufl,r4 
addr buf2.r5 


M1: movb 0(r4),r0 
addqd_1,r0 
cmpb $'a'.r0 
blo 112 
cmpb =: $'z' , r0 
bhi 112 


andb $0x5f, r0 
112: movb r0,0(r5) 


addqd_ 1,r5 
cmpqb = =s«éO, r 0 
bne W1 
ret $0 


This program allows us to exploit another Series 32000 in- 
struction, the MOVST (Move and String Translate). With a 
256 byte external table, we can translate any byte to any 
other byte. In this example, we simply use the full range of 
ASCII values in the translation table, with the lower case 
entries containing uppercase values. 


Watch for other optimization opportunities, especially with 
multiply and add sequences (the INDEXi instruction could 
be used), and possible memory to memory sequence 
changes. When optimizing Series 32000 code, it is impor- 
tant to fully utilize the Complex Instruction Set. Allow the 


#This program translates a null 
# 
# After optimization 


movqd -1,r0 
addr buf1,rl 
addr buf2,r2 
addr ctable,r3 


movad 0,r4 
movst iu 
movagb 0,0(r2) 
ret $0 


terminate ASCII string to uppercase 


# point to input string 
# point to output string 
# get a byte 


is the char less than ‘a'? 
yes, branch out 

is the char greater than ‘z‘? 
yes, branch out 


Me % Se MW 
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# and with 5f to make uppercase 
# store the character 


# is this the last char? 
# no, loop for more 
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fewest number of instructions possible to do the work. Use 
the advanced addressing modes where possible. Try to em- 
ploy larger data types in programs (Series 32000 takes the 
same number of clocks to add Bytes, Words or Double 
words). 

4.0 CONCLUSION 


Series 32000 assembly language offers a much richer com- 
plement of instructions when compared to the 80x86 as- 
sembly language. Translation from 80x86 to Series 32000 is 
made much easier by this full instruction set. 


terminate ASCII string to uppercase 


# number of bytes in string max. 
# point to input string 

# point to output string 

# address of conversion table 

# match on a zero 

# move string, translate, until 0 
# move a zero to output string 


TL/EE/9699-7 
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TABLE! 


The following is a conversion table from 80x86 mnemonics to Series 32000. Note that many of the conversions are not 
exact, as the 80x86 instructions may affect flags that Series 32000 instructions do not. A * marks those instructions that may 
be affected most by this change in flags. The i in the Series 32000 instructions refers to the size of the data to be operated 
on. It may be B for Byte, W for Word or D for Double. Most arithmetic instructions also support F for single-precision Floating 


Point, and L for double-precision Floating-Point. 
Series 32000 


S0xS6 
AAA 
AAD 
AAM 
AAS 
ADC 
ADD 
AND 
BOUND 
CALL 
CBW 
CLC 
CLD 
CLI 
CMC 
CMP 


JA/JNBE 


JAE/JNB | 


JB/JNAE 
JBE/JNA 
JCXZ 
JE/JZ 


ADDCi 
ADDi 
ANDi 
CHECKi 
BSR/JSR 
MOVXBW 


BICPSRB $1 


BICPSRW $0x800 


CMPi 
CMPSi 
MOVXWD 


ADDQi-1* 
DIVi 


ENTER [reglist],d 


WAIT 
DIVi/QUOi 
MULIi 


ADDQi 1* 


SVC 
FLAG 
RETI $0 
BHI 
BHS 
BLT 
BLS 


BEQ 
BGT 
BGE 
BLT 
BLE 
BR/JUMP 


BNE 


ADDR 


EXIT [reglist] 


MOVi/ADDQD 


ACBi-1 


Suggest changing algorithm to use ADDPi 
Suggest changing algorithm to use ADDPi/SUBPi 


Suggest changing algorithm to use SUBPi 


You may directly sign-extend data while moving 
Usually not required 

Direction encoded within string instructions 
Supervisor mode instruction 

Usually not required 


Many options available 

You may directly sign-extend data while moving 

Suggest changing algorithm to use ADDPi 

Suggest changing algorithm to use SUBPi 

Watch for flag usage 

Note: Series 32000 uses signed division 

Builds stack frame, saves regs, allocates stack space 

Usually used for Floating Point-see Series 32000 FP instructions 


DIVi rounds towards -infinity, QUOI to zero 


Series 32000 uses memory-mapped I/O 
Watch for flag usage 

Series 32000 uses memory mapped I/O 
Not exact conversion, but usually used to call O/S 
Trap on overflow 

Causes Interrupt Acknowledge cycle 
Unsigned comparison 

Unsigned comparison 

Unsigned comparison 

Unsigned comparison 

Use CMP@Qi 0, followed by BEQ 

Equal comparison 

Signed comparison 

Signed comparison 

Signed comparison 

Signed comparison 


mparison 


Subroutines should be used for these instructions 
as most Series 32000 code will not need these 
operations. 


SPRB UPSR,xxx may be useful 
Segment registers not required on Series 32000 


Restores regs, unallocates frame and stack 
Segment registers not required 

SBITIi, CBITIi interlocked instructions 

MOV instruction followed by address increment 
ACBi may use memory or register 
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Series 32000 


MOVi TOS, 


RESTORE [r0,r1 .. r7] 


LPRB UPSR,TOS 
MOVi xx, TOS 
SAVE [r0,r1 . . r7] 
SPRB UPSR,TOS 
ROTIi* 

ROTi* 

RET 

ROTI 

ROTI 


ASHi 

ASHi 

SUBCi 

SKPSi 

LSHi 

LSHi 

BISPSRB $1 
BISPSRW $0x800 
MOVi/ADDQD 
SUBi 


MOVi x[RO:b], 
XORi 


TABLE | (Continued) 


Comments 


BEQ followed by ACBi may be used 
BNE followed by ACBi may be used 
BNE followed by ACBi may be used 
BEQ followed by ACBi may be used 


Many options available 
Series 32000 uses signed multiplication 
Two’s complement 


One’s complement 


Series 32000 uses memory mapped |/O 

Series 32000 uses memory mapped |/O 

TOS addressing mode auto increments/decrements SP 
Restores list of registers 

User mode loads 8 bits, supervisor 16 bits of PSR 
Any data may be moved to TOS 

Saves list of registers 

User mode stores 8 bits, supervisor 16 bits of PSR 
Does not rotate through carry 

Does not rotate through carry 

Series 32000 string instructions use 32-bit counts 


Rotates work in both directions 
LPRB UPSR,xx may be useful 
Arithmetic shift 

Arithmetic shift works both directions 


Many options available 
Logical shift 
Logical shift works both directions 


Direction is encoded in string instructions 
Supervisor mode instruction 
MOV instruction followed by address increment 


TBITi may be used as a substitute 


MOVi x,temp; MOVi y,x; MOVi temp,y 
Scaled index addressing mode 
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, 2 2 2 National Semiconductor 
Bit Mirror Routine; Application Note 530 


series 32000® Graphics Dave Rand 


Note 7 


1.0 INTRODUCTION 


The bit mirror routine is designed to reorder the bits in an image. The bits are swapped around a fixed point, that being one 
half of the size of the data, as is shown for the byte mirror below. These routines can be used for conversion of 68000 based 
data. 


2.0 DESCRIPTION 
Hex 
Bit Number Value 
7 6 5 4 3 2 1 =0 
Source 1014141 0 0 41 +0 B2 
Resultof Mirror O 1 0 0 1 #1 #0 +41 4D 


The “mirror’, in this case, is between bits 3 and 4. 


Several different algorithms are available for the mirror operation. The best algorithm to mirror a byte takes 20 clocks ona 
NS32016 (about 2.5 clocks per bit), and uses a 256 byte table to do the mirror operation. The table is reproduced at the end 
of this document. To perform a byte mirror, the following code may be used. The byte to be mirrored is in RO, and the 
destination is to be R1. 


MOVB mirtab[r0:b],r1 #Mirror a byte 
TL/EE/9700-1 


An extension of this algorithm is used to mirror larger amounts of data. To mirror a 32-bit block of data from one location to 
another, the following code may be used. Register RO points to the source block, register R1 points to the destination. R2 is 
used as a temporary value. 


MOVZBD 0(r0),r2 #get first byte 
MOVB mirtab[r2:b] ,3(r1) #store in last place 
MOVB 1(r0),r2 #get next byte 

MOVB mirtab[r2:b] ,2(r1) #store in next place 
MOVB 2(r0),r2 #get the third byte 
MOVB mirtab(r2:b] ,1(r1) #store in next place 
MOVB 3(r0),r2 #get the last byte 
MOVB mirtab[r2:b] ,0(r1) #first place 


TL/EE/9700-2 
This code uses 33 bytes of memory, and just 169 clocks to execute. Larger blocks of data can be mirrored with this method 
as well, with each additional byte taking about 40 clocks. 


Registers can also be mirrored with this method, with just a few more instructions. To mirror RO to R1, for example, the 
following code could be used. R2 is used as a temporary variable. 


MOVZBD r0,r2 #get Isbyte 

MOVB mirtab(r2:b] ,rl #mirror the byte 

LSHD $8.r1 #move into higher byte of destination 
LSHD $-8,r0 #and of source 

MOVB r0,r2 #get Isbyte 

MOVB mirtab[r2:b],r1 #mirror the byte 

LSHD $8,r1 #move into higher byte of destination 
LSHD $-8,r0 #and of source 

MOVB r0,r2 #get Isbyte 

MOVB mirtab[r2:b],rl #mirror the byte — 

LSHD $8 rl #move into higher byte of destination 
LSHD $-8,r0 #and of source 

MOVB r0,r2 #get Isbyte 

MOVB mirtab[r2:b],rl #mirror the byte 


TL/EE/9700-3 
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This code occupies 49 bytes, and executes in 286 clocks on an NS32016. 

If space is at a premium, a shorter table may be used, at the expense of time. Each nibble (4 bits) instead of each byte is 
processed. This means that the table only requires 16 entries. To mirror a byte in RO to R1, the following code can be used. R2 
is used as a temporary variable. 


MOVB r0,r2 #get Isbyte 

ANDD $15,r2 #mask to get Is nibble 
MOVB mirtb16[r2:b] , rl #mirror the nibble 

LSHO $4,r1 #high nibble of destination 
LSHD $-4,r0 #and of source 

MOVB r0,r2 #get Isbyte 

ANDD $15,r2 #mask to get 1s nibble 

ORB mirtb16[r2:b] ,rl #mirror the nibble 


TL/EE/9700—4 


This code requires 32 bytes of memory, and executes in 125 clock cycles on an NS32016. A slightly faster time (100 clocks) 
may be obtained by adding a second table for the high nibble, and eliminating the LSHD 4,r1 instruction. 


TABLES 


MIRTAB is a table of all possible mirror values of 8 bits, or 256 bytes. MIRTB16 is a table of all possible mirror values of 4 bits, or 
16 bytes. These tables should be aligned for best performance. They may reside in code (PC relative), or data (SB relative) 
space. 


mirtab: 
-byte 0x00,0x80,0x40,0xc0,0x20,0xa0, 0x60, 0xe0, 0x10, 0x90, 0x50 
-byte Oxd0,0x30,0xb0,0x70, 0xf0 
-byte 0x08,0x88,0x48,0xc8, 0x28, 0xa8, 0x68, 0xe8, 0x18, 0x98, 0x58 
-byte Oxd8,0x38,0xb8,0x78, 0Oxf8 
-byte  0x04,0x84,0x44,0xc4,0x24,0xa4, 0x64, 0xe4, 0x14,0x94, 0x54 
-byte 0Oxd4,0x34,0xb4,0x74,0xf4 
-byte Ox0c,0x8c,0x4c,0xcc,0x2c, 0xac, Ox6c , Oxec, Oxic ,0x9c, 0x5c 
-byte Oxdc,0x3c,0xbc,0x7c, Oxfc 
-byte  0x02,0x82,0x42,0xc2,0x22,0xa2, 0x62, 0xe2,0x12,0x92, 0x52 
-byte  Oxd2,0x32,0xb2,0x72, Oxf2 
-byte O0x0a,0x8a,0x4a,0xca, 0x2a,0xaa, 0x6a,Oxea, Oxla,0x9a,0x5a 
byte Oxda,0x3a,0xba,0x7a, Oxfa 
-byte  0x06,0x86,0x46,0xc6, 0x26, 0xa6,0x66,0xe6, 0x16, 0x96, 0x56 
-byte 0xd6,0x36,0xb6,0x76, Oxf6 
.byte Ox0e , Ox8e, 0x4e, Oxce, 0x2e, Oxae, Ox6e, Oxee, Oxle, 0x9e, OxSe 
-byte Oxde,0x3e, Oxbe,0x7e, Oxfe 
-byte  0x01,0x81,0x41,0xc1,0x21,0xal,0x61,0xel,0x11,0x91,0x51 
-byte  Oxdl,0x31,0xb1,0x71,0xfl 
-byte  0x09,0x89,0x49,0xc9, 0x29 ,0xa9, 0x69, 0xe9, 0x19, 0x99, 0x59 
-byte  0Oxd9,0x39,0xb9 , 0x79, Oxf9 
-byte 0x05,0x85,0x45,0xc5, 0x25, 0xa5,0x65,0xe5,0x15,0x95,0x55 
-byte  0xd5,0x35, 0xb5, 0x75, Oxf5 
.byte  Ox0d,0x8d,0x4d, 0xcd, 0x2d, Oxad, Ox6d, Oxed, Oxld,0x9d, 0x5d 
-byte Oxdd,0x3d, Oxbd,0x7d, Oxfd 
.byte 0x03,0x83,0x43,0xc3, 0x23 ,0xa3, 0x63 ,0xe3, 0x13 ,0x93, 0x53 
-byte  0Oxd3,0x33,0xb3,0x73, 0xf3 
.byte  0x0b,0x8b,0x4b,0xcb, 0x2b, Oxab, Ox6b, Oxeb, Oxib,0x9b, 0x5b 
-byte Oxdb,0x3b, Oxbb,0x7b,0xfb 
-byte  0x07,0x87, 0x47 ,0Oxc7, 0x27 , Oxa7, 0x67 , Oxe7, 0x17 ,0x97, 0x57 
-byte Oxd7,0x37,0xb7,0x77,0xf7 
-byte Ox0f,0x8f,0Ox4f,Oxcf, Ox2f,Oxaf, Ox6f, Oxef, Oxlf,Ox9f, Ox5f 
byte Oxdf,0Ox3f, Oxbf, 0x7f, Oxff 


mirtbl6: 
-byte 0x0,0x8, 0x4, Oxc, 0x2, 0xa, 0x6, Oxe, 0x1, 0x9, 0x5 


-byte Oxd,0x3,0xb, 0x7, Oxf TL/EE/9700-5 
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Instruction Execution 
Times of FPU NS32081 
Considered for 
Stand-Alone Configurations 


The table below gives execution timing information for the 
FPU NS32081. 


The number of clock cycles nCLK is counted from the last 
SPC pulse, strobing the last operation word or operand into 
the FPU, and the Done-SPC pulse, which signals the CPU 
that the result is available (see Figure 7). The values are 
therefore independent of the operand’s addressing modes 


and do not include the CPU/FPU protocol time. This makes: 


it easy to determine the FPU execution times in stand-alone 
configurations. 
The values are derived from measurements, the worst case 


is always assumed. The results are given in clock cycles 
(CLK). 


ID OPCODE OPERANDS 


SPC 


CLK 


National Semiconductor 
Application Brief 26 
Systems & Applications Group 


Number of 
Operation Clock-Cycles 
nCLkK 


a 


Divide Float 
Divide Long 


(DONE) STATUS RESULT 


eo Ne 
TL/EE/8760~-1 


FIGURE 1 
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NATIONAL SEMICONDUCTOR CORPORATION DISTRIBUTORS 


ALABAMA 
Huntsville 


Thousand Oaks 
Bell Industries 


Elk Grove Village 
Anthem Electronics 


Grand Rapids 
Arrow Electronics 


Arrow Electronics 
(205) 837-6955 
Beil Industries 
(205) 837-1074 
Hamilton/Avnet 
(205) 837-7210 
Pioneer 

(205) 837-9300 


ARIZONA 


Phoenix 
Arrow Electronics 
(602) 437-0750 
Tempe 
Anthem Electronics 
(602) 966-6600 
Bell Industries 
(602) 966-7800 
Hamilton/Avnet 
(602) 961-6400 


CALIFORNIA 


Agora Hills 
Zeus Components 
(818) 889-3838 
Anaheim 
Time Electronics 
(714) 934-0911 
Zeus Components 
(714) 921-9000 
Chatsworth 
Anthem Electronics 
(818) 700-1000 
Arrow Electronics 
(818) 701-7500 


Hamilton Electro Sales 


(818) 700-6500 

Time Electronics 

(818) 998-7200 
Costa Mesa 

Avnet Electronics 

(714) 754-6050 


Hamilton Electro Sales 


(714) 641-4159 
Garden Grove 

Bell Industries 

(714) 895-7801 
Gardena 

Bell Industries 

(213) 515-1800 


Hamilton Electro Sales 


(213) 217-6751 
Irvine 
Anthem Electronics 
(714) 768-4444 
Ontario 
Hamilton/Avnet 
(714) 989-4602 
Rocklin 
Bell Industries 
(916) 969-3100 
Roseville 
Bell Industries 
(916) 969-3100 
Sacramento 
Anthem Electronics 
(916) 922-6800 
Hamilton/Avnet 
(916) 925-2216 
San Diego 
Anthem Electronics 
(619) 453-9005 
Arrow Electronics 
(619) 565-4800 
Hamilton/Avnet 
(619) 571-7510 
Time Electronics 
(619) 586-1331 
San Jose 
Anthem Electronics 
(408) 295-4200 
Zeus Components 
(408) 998-5121 
Sunnyvale 
Arrow Electronics 
(408) 745-6600 
Beil Industries 
(408) 734-8570 
Hamilton/Avnet 
(408) 743-3355 
Time Electronics 
(408) 734-9888 


(805) 499-6821 
Torrance 

Time Electronics 

(213) 320-0880 
Tustin 

Arrow Electronics 

(714) 838-5422 
Yorba Linda 

Zeus Components 

(714) 921-9000 


COLORADO 
Englewood 
Anthem Electronics 
(303) 790-4500 
Arrow Electronics 
(303) 790-4444 
Hamilton/Avnet 
(303) 799-9998 
Wheatridge 
Bell Industries 
(303) 424-1985 


CONNECTICUT 
Cheshire 
Time Electronics 
(203) 271-3200 
Danbury 
Hamilton/Avnet 
(203) 797-2800 
Meridan 
Anthem Electronics 
(203) 237-2282 
Norwalk 
Pioneer Northeast 
(203) 853-1515 
Wallingford 
Arrow Electronics 
(203) 265-7741 


FLORIDA 
Altamonte Springs 
Arrow/Kierulff Electronics 
(305) 682-6923 
Pioneer 
(305) 834-9090 
Deerfield Beach 
Arrow Electronics 
(305) 429-8200 
Bell Industries 
(305) 421-1997 
Pioneer 
(305) 428-8877 
Fort Lauderdale 
Hamilton/Avnet 
(305) 971-2900 
Lake Mary 
Arrow Electronics 
(407) 323-0252 
Largo 
Bell Industries 
(813) 541-4434 
Orlando 
Chip Supply 
(305) 298-7100 
Oviedo 
Zeus Components 
(407) 365-3000 
Palm Bay 
Arrow Electronics 
(305) 725-1480 
St. Petersburg 
Hamilton/Avnet 
(813) 576-3930 
Winter Park 
Hamilton/Avnet 
(407) 628-3888 


GEORGIA 
Norcross 
Arrow Electronics 
(404) 449-8252 
Bell industries 
(404) 662-0923 
Hamilton/Avnet 
(404) 447-7500 
Pioneer 
(404) 448-1711 
ILLINOIS 
Bensenville 


Hamilton/Avnet 
(312) 860-7780 


(312) 640-6066 

Bell Industries 

(312) 640-1910 
Itasca 

Arrow Electronics 

(312) 250-0500 
Urbana 

Bell Industries 

(217) 328-1077 


INDIANA 

Carmel 
Hamilton/Avnet 
(317) 844-9333 

Fort Wayne 
Bell Industries 
(219) 423-3422 

Indianapolis 


Advent Electronics Inc. 


(317) 872-4910 
Arrow Electronics 
(317) 243-9353 
Bell Industries 
(317) 634-8202 
Pioneer 

(317) 849-7300 


IOWA 
Cedar Rapids 

Advent Electronics 
(319) 363-0221 
Arrow Electronics 
(319) 395-7230 
Bell Industries 
(319) 395-0730 
Hamilton/Avnet 
(319) 362-4757 


KANSAS 
Lenexa 
Arrow Electronics 
(913) 541-9542 
Overland Park 
Hamilton/Avnet 
(913) 888-8900 


MARYLAND 
Columbia 


Anthem Electronics 


(301) 995-6640 
Arrow Electronics 
(301) 995-0003 
Hamilton/Avnet 
(301) 995-3500 
Lionex Corp. 
(301) 964-0040 
Time Electronics 
(301) 964-3090 
Zeus Components 
(301) 997-1118 
Gaithersburg 
Pioneer 
(301) 921-0660 


MASSACHUSETTS 
Lexington 
Pioneer Northeast 
(617) 861-9200 
Zeus Components 
(617) 863-8800 
Norwood 
Gerber Electronics 
(617) 769-6000 
Peabody 
Hamilton/Avnet 
(617) 531-7430 


Sertech Laboratories 


(617) 532-5105 

Time Electronics 

(617) 532-6200 
Wilmington 


Anthem Electronics 


(617) 657-5170 
Arrow Electronics 
(617) 935-5134 
Lionex Corporation 
(617) 657-5170 


MICHIGAN 
Ann Arbor 
Arrow Electronics 
(313) 971-8220 
Bell Industries 
(313) 971-9093 


(616) 243-0912 
Hamilton/Avnet 
(616) 243-8805 
Pioneer Standard 
(616) 698-1800 
Livonia 
Hamilton/Avnet 
(313) 522-4700 
Pioneer 
(313) 525-1800 
Wyoming 
R. M. Michigan, Inc. 
(616) 531-9300 


MINNESOTA 

Eden Prairie 
Anthem Electronics 
(612) 944-5454 
Pioneer-Twin Cities 
(612) 944-3355 

Edina 
Arrow Electronics 
(612) 830-1800 

Minnetonka 
Hamilton/Avnet 
(612) 932-0600 


MISSOURI 

Earth City 
Hamilton/ Avnet 
(314) 344-1200 

St. Louis 
Arrow Electronics 
(314) 567-6888 
Time Electronics 
(314) 391-6444 


NEW HAMPSHIRE 
Hudson 
Bell Industries 
(603) 882-1133 
Manchester 
Arrow Electronics 
(603) 668-6968 
Hamilton/ Avnet 
(603) 624-9400 


NEW JERSEY 
Cherry Hilt 
Hamilton/ Avnet 
(609) 424-0100 
Fairfield 
Hamilton/ Avnet 
(201) 575-3390 
Lionex Corporation 
(201) 227-7960 
Marlton 
Arrow Electronics 
(609) 596-8000 
Parsippany 
Arrow Electronics 
(201) 538-0900 
Pine Brook 
Nu Horizons Electronics 
(201) 882-8300 
Pioneer 
(201) 575-3510 


NEW MEXICO 
Albuquerque 
Alliance Electronics Inc. 
(505) 292-3360 
Arrow Electronics 
(505) 243-4566 
Beil Industries 
(505) 292-2700 
Hamilton/Avnet 
(505) 765-1500 


NEW YORK 
Amityville 
Nu Horizons Electronics 
(516) 226-6000 
Binghamton 
Pioneer 
(607) 722-9300 
Buffalo 
Summit Distributors 
(716) 887-2800 
Fairport 
Pioneer Northeast 
(716) 381-7070 


NEW YORK (Continued) 
Hauppauge 
Anthem Electronics 
(516) 273-1660 
Arrow Electronics 
(516) 231-1000 
Hamilton/Avnet 
(516) 434-7413 
Lionex Corporation 
(516) 273-1660 
Time Electronics 
(516) 273-0100 
Port Chester _ 
Zeus Components 
(919) 937-7400 
Rochester 
Arrow Electronics 
(716) 427-0300 
Hamilton/Avnet 
(716) 475-9130 
Summit Electronics 
(716) 334-8100 
Ronkonkoma 
Zeus Components 
(516) 737-4500 
Syracuse 
Hamilton/Avnet 
(315) 437-2641 
Time Electronics 
(315) 432-0355 
Westbury 
Hamilton/Avnet 
(516) 997-6868 


NORTH CAROLINA 
Charlotte 
Pioneer 
(704) 527-8188 
Durham 
Pioneer Technology 
(919) 544-5400 
Raleigh 
Arrow Electronics 
(919) 876-3132 
Hamiiton/Avnet 
(919) 878-0810 
Winston-Salem 
Arrow Electronics 
(919) 725-8711 


OHIO 

Centerville 
Arrow Electronics 
(513) 435-5563 

Cleveland 
Pioneer 
(216) 587-3600 

Dayton 
Beil Industries 
(513) 435-8660 
Bell Industries 
(513) 434-8231 
Hamilton/Avnet 
(513) 439-6700 
Pioneer 
(513) 236-9900 
Zeus Components 
(914) 937-7400 


Highland Heights 
CAM/Ohio Electronics 
(216) 461-4700 

Solon 
Arrow Electronics 
(216) 248-3990 
Hamilton/Avnet 
(216) 831-3500 

Westerville 
Hamilton/Avnet 
(614) 882-7004 


OKLAHOMA 
Tulsa 

Arrow Electronics 
(918) 252-7537 
Hamilton/Avnet 
(918) 252-7297 
Quality Components 
(918) 664-8812 
Radio Inc. 
(918) 587-9123 


OREGON 

Beaverton 
Almac-Stroum Electronics 
(503) 629-8090 
Anthem Electronics 
(503) 643-1114 
Arrow Electronics 
(503) 645-6456 

Lake Oswego 
Beil Industries 
(503) 241-4115 
Hamilton/Avnet 
(503) 635-7850 


PENNSYLVANIA 
Horsham 
Anthem Electronics 
(215) 443-5150 
Lionex Corp. 
(215) 443-5150 
Pioneer 
(215) 674-4000 
King of Prussia 
Time Electronics 
(215) 337-0900 
Monroeville 
Arrow Electronics 
(412) 856-7000 
Pittsburgh 
Hamilton/Avnet 
(412) 281-4150 
Pioneer 
(412) 782-2300 
CAM/RPC Ind. Elec. 
(412) 782-3770 
TEXAS 
Addison 
Quality Components 
(214) 733-4300 


NATIONAL SEMICONDUCTOR CORPORATION DISTRIBUTORS (Continued) 


Austin 

Arrow Electronics 

(512) 835-4180 

Hamilton/Avnet 

(512) 837-8911 

Pioneer 

{512) 835-4000 

Quality Components 

(512) 835-0220 

Minco Technology Labs 

(512) 834-2022 
Carrollton 

Arrow Electronics 

(214) 380-6464 
Dallas 

Pioneer 

(214) 386-7300 
Houston 

Arrow Electronics 

(713) 530-4700 

Pioneer 

(713) 988-5555 
Irving 

Hamilton/Avnet 

(214) 550-7755 
Richardson 

Anthem Electronics 

(214) 238-0237 

Zeus Components 

(214) 783-7010 
Stafford 

Hamilton/Avnet 

(713) 240-7733 
Sugarland 

Quality Components 

(713) 240-2255 


UTAH 

Midvale 
Bell Industries 
(801) 972-6969 

Salt Lake City 
Anthem Electronics 
(801) 973-8555 
Arrow Electronics 
(801) 973-6913 
Bell Industries 
(801) 972-6969 
Hamilton/Avnet 
(801) 972-4300 


WASHINGTON 
Bellevue 
Almac-Stroum Electronics 
(206) 643-9992 
Hamilton/Avnet 
(206) 453-5844 
Kent 
Arrow Electronics 
(206) 575-4420 
Redmond 
Anthem Electronics 
(206) 881-0850 
Hamilton/Avnet 
(206) 867-0148 


WISCONSIN 
Brookfield 
Arrow Electronics 
(414) 792-0150 
Mequon 
Taylor Electric 
(414) 241-4321 
Waukesha 
Bell Industries 
(414) 547-8879 
Hamilton/Avnet 
(414) 784-4516 


CANADA 
WESTERN PROVINCES 
Burnaby 
Hamilton/Avnet 
(604) 437-6667 
Semad Electronics 
(604) 438-2515 
Calgary 
Hamilton/Avnet 
(403) 250-9380 
Semad Electronics 
(403) 252-5664 
Zentronics 
(403) 272-1021 
Edmonton 
Zentronics 
(403) 468-9306 
Richmond 
Zentronics 
(604) 273-5575 
Saskatoon 
Zentronics 
(306) 955-2207 
Winnipeg 
Zentronics 
(204) 694-1957 


EASTERN PROVINCES 
Brampton 
Zentronics 
(416) 451-9600 
Mississauga 
Hamilton/Avnet 
(416) 677-7432 
Nepean 
Hamilton/Avnet 
(613) 226-1700 
Zentronics 
(613) 226-8840 
Ottawa 
Semad Electronics 
(613) 727-8325 
Pointe Claire 
Semad Electronics 
(514) 694-0860 
St. Laurent 
Hamilton/Avnet 
(514) 335-1000 
Zentronics 
(514) 737-9700 
Waterloo 
Zentronics 
(800) 387-2329 
Willowdale 
ElectroSound Inc. 
(416) 494-1666 
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2900 Semiconductor Drive - 
P.O. Box 58090 - i! 
Santa Clara CA'95052-8090 
Tel: (408) 724-5000 
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ALABAMA ca oe ee 
Huntsville “ 
(205) 837-8960 = iim ° 
(205) 721-9367* 6 
ARIZONA ay 
Tempe 
(602) 966-4563 


B.C. 
Burnaby 
(604) 435-8107 : 


CALIFORNIA 
Encino 
(818) 888-2602 
Inglewood 
(213) 645-4226 
Roseville : 
(916) 786-5577 : 
San Diego > S 
(619) 587-0666 - ; 
Santa Clara 
(408) 562-5900 
Tustin 
(714) 259-8880 
Woodland Hills 
(818) 888-2602 $ 


COLORADO 

Boulder Pia ¥ 
(303) 440-3400. « “aN 
Colorado Springs . « 

(303) 578-3319 
Englewood.) 

(303) 790-8090 ’ 


CONNECTICUT 
Fairfield ; > 
(203)-371-0181 
Hamden 
(203) 288-1560 : 
FLORIDA ‘ica oP 
Boca Raton j ; 
(305) 997-8133 
Orlando 
(305) 629-1720 
St. Petersburg 
(813) 577-1380 


GEORGIA 
Atlanta _ , 
(404) 396-4048 
Norcross’ nt 
(404) 441-2740 
ILLINOIS | ; 


Schaumburg 
(312) 397-8777 


INDIANA — ! 
Carmel 
(317) 843-7160 
Fort Wayne 
(219) 484-0722 


1IOWA 
Cedar Rapids 
(319) 395-0090 


al : 


if, 


‘KANSAS =& 


:¥ 
Overland Park: 


(913) 451-8374 

MARYLAND. 
Hanover 

. (301) 796-8900 
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MASSACHUSETTS 
* Burlington. Z % 
(617) 273-3170 
Waltham 
(617) 890-4000 


MICHIGAN. 
W. Bloomfield 
(313) 855-0166 : 


MINNESOTA : 
Bloomington ; . 
(612) 835-3322 
(612) 854-8200 


NEW JERSEY 
Paramus 
(201) 599-0955 
NEW MEXICO 
Albuquerque 
(505) 884-5601 


NEW YORK 
Endicott 
(607) 757-0200 
Fairport 
(716) 425-1358 
(716) 223-7700 
Melville 
(516) 351-1000 
- Wappinger Falls — 
(914) 298-0680 


NORTH CAROLINA 
Cary 
(919) 481-4311 


OHIO 
Dayton 
(513) 435-6886 : 
Highland Heights 
(216) 442-1555 
(216) 461-0191 


ONTARIO 
Mississauga 
(416) 678-2920 ~ 
Nepean ’ 
(404) 441-2740 
(613) 596-0411 
Woodbridge 
(416) 746-7120 
OREGON 
Portland 
(503) 639-5442 


PENNSYLVANIA 
Horsham 
(215) 675-6111 : 
Willow Grove 
(215) 657-2711 


PUERTO RICO 


Rio Piedias 
(809) 758-9211 
QUEBEC 4 
Dollard Des Ormeaux ~ 
(514) 683-0683 
Lachine 
(514) 636-8525 
TEXAS 
Austin 
(512) 346-3990 
Houston “ 
(713) 771-3547 
Richardson 
(214) 234-3811 
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Ti ee 
Salt Lake City ~ 
(801) 322-4747 


WASHINGTON “i 
Bellevue we 
_ (206) 453-9944 


WISCONSIN 

Brookfield 
(414) 782-1818 

Milwaukee 
(414) 527-3800 


INTERNATIONAL 
OFFICES 


Electronica NSC de Mexico SA 
Juventino Rosas No. 118-2 

Col Guadalupe Inn 

Mexico, 01020 D.F. Mexico 

Tel: 52-5-524-9402 


National Semicondutores 
Do Brasil Ltda. 

Av. Brig. Faria Lima, 1409 

6 Andor Salas 62/64 

01451 Sao Paulo, SP, Brasil 
Tel: (65/11) 212-5066 
Telex: 391-1131931 NSBR BR 


National Semiconductor GmbH 
Industriestrasse 10 

D-8080 Furstenfeldbruck 

West Germany 

Tel: 49-08141-103-0 

Telex: 527 649 


National Semiconductor (UK) Ltd. 
301 Harpur Centre 

Horne Lane 

Bedford MK40 ITR” 

United Kingdom 

Tel: (02 34) 27 00 27 

Telex: 826 209 


National Semiconductor Benelux 
Vorstlaan 100 
B-1170 Brussels 
Belgium 
Tel: (02) 6725360 
. Telex: 61007 


: National Semiconductor (UK) Ltd. 
1, Bianco Lunos Alle 
DK-1868 Fredriksberg C 
Denmark 
Tel: (01) 213211 
Telex: 15179 


- National Semiconductor 
Expansion 10000 
28, rue de la Redoute © 
F-92260 Fontenay-aux-Roses 
France 
Tel: (01) 46 60 81 40 
Telex: 250956 © 


National Semiconductor Sp.A. 

Strada 7, Palazzo R/3 

20089 Rozzano~ — “ 
. Milanofiori f 

Italy 

Tel: (02) 8242046/7/8/9 


National Semiconductor AB 
Box 2016 

Stensatravagen 13 

S-12702 Skarholmen 


“Sweden ~ 


Tel: (08) 970190 
Télex: 10731 


National Semiconductor 
Calle Agustin de Foxa, 27 
28036 Madrid 

Spain 

Tel: (01) 733-2958 

Telex: 46133 


~, National Semiconductor Switzerland 
* Alte Winterthurerstrasse 53 


Postfach 567 aur" 
Ch-8304 Wallisellen-Zurich 

Switzerland 

Tel: (01) 830-2727 

Telex: 59000 


National Semiconductor 
Kauppakartanonkatu 7 
SF-00930 Helsinki 
Finland 

Tel: (0) 33 80 33 


Telex: 126116 


National Semiconductor Japan Ltd. 
Sanseido Bldg. 5F 

4-15 Nishi Shinjuku 

Shinjuku-ku 

Tokyo 160 Japan 

Tel: 3-299-7001 

Fax: 3-299-7000 


National Semiconductor 
Hong Kong Ltd. 
Southeast Asia Marketing 
Austin Tower, 4th Floor 
22-26A Austin Avenue 
Tsimshatsui, Kowloon, H.K. 
Telv852 3-7243645 
Cable: NSSEAMKTG 
Telex: 52996 NSSEA HX 


National Semiconductor 
(Australia) PTY, Ltd. 

1st Floor, 441 St. Kilda Rd. 
Melbourne, 3004 

Victory, Australia 

Tel: (03) 267-5000 

Fax: 61-3-2677458 


National Semiconductor (PTE), Ltd. 
200 Cantonment Road 13-01 
Southpoint 

Singapore 0208 

Tel: 2252226 

Telex: RS 33877 


National Semiconductor (Far East) Ltd. 
Taiwan Branch 

P.O. Box 68-332 Taipei 

7th Floor, Nan Shan Life Bidg. 

302 Min Chuan East Road, 

Taipei, Taiwan R.O.C. 

Tel: (86) 02-501-7227 

Telex: 22837 NSTW 

Cable: NSTW TAIPEI 


National Semiconductor (Far East) Ltd. 
Korea Office 

Room 612, 

Korea Fed. of Small Bus. Bidg. 

16-2, Yoido-Dong, 

Youngdeungpo-Ku 


Seoul, Korea 


Tel: (02) 784-8051/3 - 785-0696-8 
Telex: K24942 NSRKLO 


7 


ee Ns es ee ws 


